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Abstract 

We study the universality of spectral statistics of large random matrices. We consider N x N symmet- 
ric, hermitian or quaternion self-dual random matrices with independent, identically distributed entries 
(Wigner matrices) where the probability distribution of each matrix element is given by a measure v with 
zero expectation and with a subexponential decay. Our main result is that the correlation functions of 
the local eigenvalue statistics in the bulk of the spectrum coincide with those of the Gaussian Orthogonal 
Ensemble (GOE), the Gaussian Unitary Ensemble (GUE) and the Gaussian Symplectic Ensemble (GSE), 
respectively, in the limit — > oo. Our approach is based on the study of the Dyson Brownian motion 
via a related new dynamics, the local relaxation flow. 

As a main input, we establish that the density of eigenvalues converges to the Wigner semicircle law 
and this holds even down to the smallest possible scale, and, moreover, we show that eigenvectors are 
fully delocalized. These results hold even without the condition that the matrix elements are identically 
distributed, only independence is used. In fact, we give strong estimates on the matrix elements of the 
Green function as well that imply that the local statistics of any two ensembles in the bulk are identical if 
the first four moments of the matrix elements match. Universality at the spectral edges requires matching 
only two moments. We also prove a Wegner type estimate and that the eigenvalues repel each other on 
arbitrarily small scales. 
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1 Introduction 



This survey is based upon the lecture notes that the author has prepared for the participants at the Arizona 
School of Analysis and Applications, Tucson, AZ in 2010. The style of the presentation is closer to the 
informal style of a lecture than to a formal research article. For the details and sometimes even for the 
precise formulation we refer to the original papers. 

In the first introductory section we give an overview about universality of random matrices, including 
previous results, history and motivations. We introduce some basic concepts such as Wigner matrix, Wigner 
semicircle law, Stieltjes transform, moment method, sine-kernel, gap distribution, level repulsion, bulk and 
edge universality, invariant ensembles. Green function comparison theorem, four moment theorem, local 
relaxation flow and reverse heat flow. Some of these concepts will not be used for our main results, but 
we included them to help the orientation of the reader. The selection of the material presented in the first 
section is admittedly reflects a personal bias of the author and it is not meant to be comprehensive. It is 
focused on the background material for the later sections where we present our recent results on universality 
of random matrices. 

There are several very active research directions connected with random matrices that are not mentioned 
in this survey at all, e.g. supersymmetrie methods or connection with free probability. Some other very 
rich topics, e.g. edge universality with the moment method or connections with orthogonal polynomials, 
are mentioned only superficially. We refer the reader to more comprehensive surveys on random matrices, 
especially the classical book of Mehta [70] , the survey of the Riemann-Hilbert approach of Dcift [22] , the 
recent book of Anderson, Guionnet and Zeitouni [4] and the forthcoming book of Forrester [53]. An excellent 
short summary about the latest developments is by Guionnet [60]. 

Starting from Section 2, we present our recent results that gives the shortest and up to now the most 
powerful approach to the bulk universality for N x N Wigner matrices. One of the main results is formulated 
in Theorem 5.2. These results were obtained in collaboration with J. Ramirez, S. Peche, B. Sehlein, H.-T. 
Yau and J. Yin; see the bibliography for precise references. In this part we strive for mathematical rigor, 
but several details will be referred to the original papers. The argument has three distinct steps: 

1. Local semicircle law (Section 2); 

2. Universality for Gaussian convolutions via the local relaxation flow (Section 3); 

3. Green function comparison theorem (Section 4). 

Finally, in Section 5, we put together the proof from these ingredients. The main result on universality of 
local statistics in the bulk for Wigner matrices is formulated in Theorem 5.1. Some technical lemmas are 
collected in the Appendices that can be neglected at first reading. 

Convention: Throughout the paper the letters C and c denote positive constants whose values may 
change from line to line and they are independent of the relevant parameters. Since we will always take the 
N ^ oo limit at the end, all estimates are understood for sufficiently large N. In informal explanations we 
will often neglect logarithmic factors, by introducing the notation < and <ti to indicate inequality "up to 
some log A'' factor". More precisely, A< B means A < (log N)'~^B with some non-negative constant C, and 
A <C i? means A < {log N)~'~' B with some positive constant C. 

Acknowledgement. The author thanks H.-T. Yau for suggestions to improve the presentation of this 
overview. 
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1.1 Summary of the main results: an orientation for the reader 

We will consider N by N matrices H = {hij)^^^^ whose entries are real or complex random variables. In 
most cases we assume that H is hermitian or symmetric, but our method applies to other ensembles as well 
(our results for matrices with quaternion entries will not be discussed here, see [46]). We assume that the 
entries are independent up to the symmetry constraint, hij = hji, they are centered, Khij = 0, and their 
tail probability has a uniform subexponential decay (see (2.32) later). We do not assume that the matrix 
elements are identically distributed but we assume that the variances, cr?^- :— E|/iy p satisfy the normalization 
condition 

N 

^4 = 1, 1 = 1,2,. ..,N. (1.1) 

i.e., the deterministic N x N matrix of variances, S = {cfj), is symmetric and doubly stochastic. These 
conditions guarantee that — 1 < S < 1. We will always assume that 1 is a simple eigenvalue of E and there 
is a positive number (5_ > such that —1 + 6- < E. This assumption is satisfied for practically any random 
matrix ensembles. Sometimes we will need a uniform gap condition, i.e. that there exists a positive 5+ > 
such that 

Spec E C [-1 + 1 - (5+] U {1}. 

For example, for the standard Wigner matrix afj = A^"^ and 6- = = 1. For random band matrices (see 
(1.18) for the precise definition) with band width W satisfying 1 -^W N, the gap S+ goes to zero as the 
size of the matrix increases. 

The normalization (1.1) ensures that the bulk of the spectrum of H lies in the interval [—2,2] and the 
density of eigenvalues Ai < A2 < . . . < Aat is given by the Wigner semicircle law as iV — )- 00. Apart from the 
vicinity of the edges ±2, the typical spacing of neighboring eigenvalues is of order 1/N. We are interested 
in the statistics of the eigenvalues in the N ^ 00 limit. 



1.1.1 Summary of Section 2: Main results on the local semicircle law 

In Section 2 we prove that the density of eigenvalues follows the semicircle law down to the smallest possible 
scale, i.e., to scales only a slightly larger than 1/A^. Wc will call it local semicircle law. The local semicircle 
law is identified via the Stieltjes transform of the empirical density of the eigenvalues, 

1 " 1 

to(z) := mAf(z) = — ^ _ , z = E + ir], £; e E, > 0, 

J = l -I 

and wc show that m^iz) converges to the Stieltjes transform of the semicircle density 

gscix)dx 



Jr X- z 2n 

in the limit iV — > 00. The imaginary part r/ = 3m z may depend on N and it corresponds to the local scale 
on which the density is identified. The precision of our approximation is of order {Nj])~^. Our best result 
in this direction is Theorem 2.1 of [50], which wc will call the strong local semicircle law: 

C(logjV)^ 

\m{z) - m,c{z)\ < — , (1.2) 
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for some sufficiently large L and with a very high probability (see Section 2.4). This result holds even for 
a more general class Wigner matrices whose variances are comparable (see (1.17) for precise definition), the 
key input is that in this case we have (5+ > 0. 

For even more general Wigner matrices (they will be called universal Wigner matrices, see Definition 1.1 
later), the key quantity that measures the precision is the spread of the matrix^ defined by 

M := ^- — ^. (1.3) 

maxy afj 

For typical random band matrices (see (1-18) for the precise definition), M is comparable with the band 
width W . If A/ N, then the precision of our estimates is determined by M instead of N, for example, in 
[49] we obtain 

for any £ > 0, with a very high probability (see Theorem 2.5 later), or 



\miz)-msc{z)\< ^j-^, k:=||^|-2|, (1.4) 



, / X / M C(logN)^ , , 

|m(^)-m,,(z)|< ^ 1^^ (1.5) 

was proven in Theorem 2.1 of [48]. Note that these estimates deteriorate near the spectral edges. 

It is well known that the identification of the Stieltjes transform of a measure for the complex parameters 
z = i? + 177, i? G R, is equivalent to knowing the density down to scales essentially 0{j]), thus we obtain the 
control on the density down to scales essentially of order r\ ^ XjM. 

The Stieltjes transform m(z) can also be viewed as the normalized trace of the resolvent, 

11^ 1 
m(z) = -G{z) = - ^ G..(z), G[z) := . 

2 = 1 

In addition to (1.2), we are able to prove that not only the sum, but each diagonal element Gaiz) is given 
by the semicircle law, but the precision is weaker: 

C 

max I Gji(z) - msc (2) I < z = E + i7]. (1.6) 

Finally, we can also show that the off-diagonal resolvent elements are small: 

max|G,;,(z)|<^ (1.7) 

with logarithmic corrections [50] (sec Theorem 2.19 in Section 2.4). In our previous papers, [48, 49], the 
constant G in (1.6) and (1.7) depended on k, i.e. the estimates deterioriated near the spectral edge as an 
inverse power of k; the exponent depends on whether a positive uniform lower bound (5+ > is available 
or not. For more general Wigner matrices, e.g. for band matrices, we obtain the same estimates but M 
replaces N on the right hand sides of (1.6) and (1.7) and G depends on k. The precise statements are given 
in Theorem 2.5. 

The asymptotics of the Stieltjes transform can be translated into the asymptotics of the counting function 
(e.g. Theorem 2.6) or into a result on the location of the eigenvalues (Theorem 2.7). Moreover, the local 
semicircle law easily implies that the eigenvectors are fully delocalized (see Section 2.5). 
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1.1.2 Summary of Section 3: Main results on the bulk universality with Gaussian component 

Bulk universality refers to the fact that local eigenvalue statistics, i.e., correlation functions of eigenvalues 
rescaled by a factor N, or the distribution of the gap between consecutive eigenvalues exhibit universal 
behavior which is solely determined by the symmetry class of the ensemble. 

Bulk universality has first been proved for Gaussian Wigner ensembles, i.e., when the matrix elements hij 
are i.i.d. Gaussian random variables by Dyson [32] and Mehta [71]. The Gaussian character makes explicit 
calculations easier that are needed to identify the limiting correlation functions (e.g. the celebrated sine 
kernel for the hermitian case). The key fact is that the joint distribution function for the eigenvalues of such 
ensembles is explicit and it contains a Vandermonde determinant structure from which the local universality 
can be deduced, see (1.44). 

It is a natural idea to consider a broader class of matrices that still have some Gaussian character; the 
useful concept is the Gaussian divisible ensembles, i.e., where the probability law of each matrix elements 
contains a Gaussian component (Gaussian convolution). 

One approach with Gaussian convolutions is to push the explicit calculations further by finding a similar 
Vandermonde structure. Based upon an earlier paper of Brezin and Hikami [17], Johansson [64] has found 
a representation formula for correlation functions and he was able to prove bulk universality for Gaussian 
divisible matrices. For an algebraic reason, this method is available for the hermitian case only. 

The size of the Gaussian component in [64] was substantial; it was of the same order as the non-Gaussian 
part. Using our local semicircle law and a slightly modified version of an explicit representation formula of 
Johansson [64] we were able to prove bulk universality for hermitian Wigner matrices with a tiny Gaussian 
component of variance 0{N~^^^) with an improved formula in [44] (Section 1.6.1). 

The second approach (sketched in Section 1.6.2 and elaborated in Section 3) is to embed the Gaussian 
divisible ensemble into a stochastic flow of matrices, and use the key observation of Dyson [31] that under 
this flow the eigenvalues perform a specific stochastic dynamics with a logarithmic interaction, the celebrated 
Dyson Brownian Motion. Eventually the dynamics relaxes to equilibrium, which is the well known Gaussian 
model (GUE, GOE or GSE). The main idea is that the local relaxation is much faster, i.e., the local statistics 
of eigenvalues already reach their equilibrium within a very short t = N~'^ time (with an explicit e > 0). In 
fact, Dyson [31] has predicted that the time scale to local equilibrium is of order N^^, which we eventually 
proved in [50] . Our main result states that the local correlation functions of Gaussian divisible matrices with 
a small Gaussian component coincide with the correlation functions of the purely Gaussian ensembles. 

This result can be formulated in a general setup and viewed as a strong local crgodicity of the Dyson 
Brownian motion or, in fact, of any one dimensional stochastic particle dynamics with logarithmic interaction. 
This general formulation appeared first in [46] and it will be given in Theorem 3.3, but most of the key ideas 
were invented in [42]. For the application of this general principle to random matrices one needs certain 
apriori information about the location of the eigenvalues, which we obtain from the local semicircle law. In 
particular, using this idea, the bulk universality for symmetric Wigner matrices was first proved in [42]. The 
cases of quaternion self-dual and sample covariance matrices were treated in [46] . 

1.1.3 Summary of Section 4: Main results on the removal of the Gaussian component 

To prove the universality of any Wigner ensemble, we need to compare it with a Gaussian divisible ensemble 
for which universality has already been proven. Such comparison principle is plausible if the Gaussian 
component is small and indeed a perturbation argument can be applied. It is essentially a density argument, 
stating that Gaussian divisible ensembles are sufficiently "dense" in the space of all Wigner ensembles. 
The first result of this type used a reversed heat flow argument [44], where we showed that any smooth 
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distribution can be approximated with a very high precision by a Gaussian divisible distribution. Combining 
this method with the bulk universahty for hermitian Wigner matrices with a Gaussian component of variance 
of order 0{N~^^^), we were able to prove bulk universality for any Wigner ensemble under the condition 
that the distribution of the matrix elements is smooth. 

A more robust approach is the Green function comparison theorem from [48], which states that for two 
matrix ensembles, the joint distribution of the Green functions coincides, provided that the first four moments 
of the probability law of the matrix elements arc identical or very close. The spectral parameter z can have 
a very small imaginary part 3m z ~ N'^'"^, i.e., these Green functions can detect individual eigenvalues. 
The precise statement is given in Theorem 4.1. The key input is the local semicircle law involving individual 
matrix elements of the resolvent, (1.6)-(1.7). 

The combination of the results of Section 1.1.2 on the bulk universality of Gaussian divisible matrices 
and the Green function comparison theorem gives the bulk universality for any Wigner ensemble by a simple 
matching argument [49]. The method applies even to matrices with comparable variances (1.17). The only 
condition in the approach is a subexponential decay for the tail of the probability law of the matrix elements 
(2.32). In fact, this condition can be relaxed to a sufficiently fast polynomial decay, but for simplicity we 
will not pursue this direction. 

The four moment condition was first observed by Tao and Vu [96] in the four moment theorem for 
eigenvalues (Theorem 1.5). Their key technical input is also the local semicircle law and its corollary on 
delocalization of eigenvectors. They used this result to prove the universality for hermitian Wigner matrices 
without smoothness condition but under some moment and support condition, that especially excluded the 
Bernoulli distribution. The bulk universality for hermitian Wigner matrices including the Bernoulli case the 
was first proved in [45] after combining the results of [44] and [96] . 

Finally, in Section 5 we state the main result (Theorem 5.1) on bulk universality and we summarize how 
its proof follows from the previous sections. Currently the local relaxation flow method combined with the 
Green function comparison theorem gives the most general approach to bulk universality. This path not 
only proves bulk universality for general Wigner ensembles, but it also offers a conceptual understanding 
how universality emerges from simple principles. 

In Sections 1.2-1.7 we review several facts, results and methods in connection with Wigner random 
matrices and some related ensembles. These sections are meant to provide a general background information. 
In Section 1.6 we also explain the key new ideas listed above in more details and give a summary of various 
results on bulk universality. A reader wishing to focus only on the most recent developments can skip 
Sections 1.2-1.7 and jump to Section 2. 

1.2 Wigner matrix ensemble 

A central question in probability theory is the universality of cumulative statistics of a large set of independent 
data. Given an array of N independent random variables 



{Xi,X2, ■ ■ ■ 



Xn) 



(1.8) 



one forms linear statistics like the mean or the fluctation 




(1.9) 
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Under very general conditions, a universal pattern emerges as iV — ^ oo: the mean converges to its expectation, 
in particular, it becomes deterministic, 

lim EX^^\ 

assuming that the latter limit exists (law of large numbers). Moreover, the fluctation 5^^-' converges to a 
centered normal Gaussian random variable ^ 

S^^' ^ (in distribution) (1.10) 



(central limit theorem), i.e., the density function of ^ is given by f{x) = (v^ttct) ^ exp (— x^/2ct^). The 
variance of ^ is given by the average of the variances of Xj , 



N 

1 2 



0-2 := lim i- cr?, cr? E\X, - EX, 



N^oc N 

J = l 

In particular, for independent, identically distributed (i.i.d.) random variables, X^^^ converges to the 
common expectation value of Xj^s, and S*^^-* converges to the centered normal distribution with the common 
variance of Xj^s. 

The emergence of a single universal distribution, the Gaussian, is a remarkable fact of Nature. It shows 
that large systems with many independent components in a certain sense behave identically, irrespective of 
the details of the distributions of the components. 

It is natural to generalize this question of universality from arrays (1.8) to double arrays, i.e., to matrices: 



XiN,M) 



( X\i Xi2 . . . XiN \ 
X21 X22 ■ ■ ■ X2N 



\Xmi Xm2 ■■■ XainJ 



(1.11) 



with independent entries. The statistics in question should involve a quantity which reflects the matrix 
character and is influenced by all entries, for example the (Euclidean) norm 

Qf j^(7v,M)_ Although the norm 

of each random realization of X^^'^^-* may differ, it is known, for example, that in the limit as N,M — )■ co, 
such that N/M — > d, < d < 1 is fixed, it becomes deterministic, e.g. we have [69, 105] 

^(7(1 + Vd) (1.12) 



assuming that the matrix elements are centered, EXij = 0, and their average variance is cr^. Note that 
the typical size of X*^^'*^' is only of order a/M despite that the matrix has dimensions M x N filled with 
elements of size 0(1). If the matrix elements were strongly correlated then the norm could be of order AI. For 
example, in the extreme case, if all elements were the same, X^j = X, then ~ M. Independence 

of the matrix elements prevents such conspiracy and it reduces the typical size of the matrix by a factor of 
a/M, similarly to the central limit theorem (note the ViV normalization in (1.9)). 

Matrices offer a much richer structure than studying only their norm. Assuming M ~ N, the most 
important characteristics of a square matrix are the eigenvalues and eigenvectors. As (1-12) suggests, it is 
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convenient to assume zero expectation for the matrix elements and rescale the matrix by a factor A^~^/^ to 
have a norm of order 1. For most of this presentation, we will therefore consider large N x N square matrices 
of the form 



H = = 



h2i 



hi2 

h22 



yhjyi hN2 



with centered entries 



Eh,,=0, ^,J-l,2,... 
As for the normalization, we assume that the matrix of variances 



hiN\ 

h2N 



Hnn J 
,N. 



E := 



4i 



'12 
'22 



'IN 
'2N 



\ 



2 

Nl 



m,. 



(1.13) 



(1.14) 



(1.15) 



'N2 



'NN 



is doubly stochastic, i.e., for every i = 1,2, . 



E4 



N we have 

E4 



1. 



J 3 

The most natural example is the mean-field model, when 

2 



a. 



1 



^,.7 = l,2,...,iV, 



i.e., each matrix element is of size hij ~ N^^l"^ . This corresponds to the standard Wigner matrix. For most 
of this presentation the reader can restrict the attention to this case. 

Random matrices are typically subject to some symmetry restrictions, e.g. we will consider symmetric 
(hij = hji € K) or hermitian (hij = hji € C) random matrices. We will mostly assume that the matrix 
elements arc independent up to the symmetry requirement (i.e. in case of symmetric or hermitian matrices, 
the variables {hij : i < j} arc independent). This leads us to the 



Definition 1.1 An N x N symmetric or hermitian random matrix (1-13) is called universal Wigner 

^2 _ iirif,. .12 



matrix (ensemble) if the entries are centered (1.14), their variances af^ ~ E|/i.y p satisfy 



^4 = 1, ^ = l,2,.. 



,N 



(1.16) 



id {hij : i < j} are independent. An important subclass of universal Wigner ensembles is called gener- 



alized Wigner matrices (ensembles) if additionally, the variances are comparable, i.e. 



< Qnf < Nal < Cup < 



oo. 



i,j^l,2,...,N, 



(1.17) 



holds with some fixed positive constants Cinf, C^up- I^i the special case afj = 1/N , we recover the original 
definition of the Wigner matrices or Wigner ensemble [106]. 
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The most prominent Wigner ensembles are the Gaussian Orthogonal Ensemble (GOE) and the Gaussian 
Unitary Ensemble (GUE); i.e., symmetric and hermitian Wigner matrices with rescaled matrix elements 
\/Nhij being standard Gaussian variables (in the hermitian case, \/Nhij is a standard complex Gaussian 
variable, i.e. W^^/Nhij]"^ = 1). 

For simplicity of the presentation, in case of the Wigner ensembles, we will assume that , i < j, are 
identically distributed (i.e. not only their variances are the same). In this case we fix a distribution v and we 
assume that the rescaled matrix elements VNhij are distributed according to v. Depending on the symmetry 
type, the diagonal elements may have a slightly different distribution, but we will omit this subtlety from 
the discussion. The distribution v will be called the single entry distribution of H. 

We will sometimes mention a special class of universal Wigner matrices that have a band structure; they 
will be called random band matrices. The variances are given by 

(1.18) 

where ^ 1, / : M — > M+ is a bounded nonnegative symmetric function with / f{x)dx = 1 and we defined 
[i ^ 3]n £ Z by the property that [i — j]jv = i — j modiV and —^N < [i — j]N < ^N. Note that the 
relation (1.16) holds only asymptotically as -> cx) but this can be remedied by an irrelevant rescaling. 
One can even consider d-dimensional band matrices^ where the rows and columns arc labelled by a finite 
lattice A C Z'' and afj depends on the difference i ~ i for any i,j G A. 

Another class of random matrices, that even predate Wigner, arc the random covariance matrices. These 
are matrices of the form 

H = X*X, (1.19) 

where A" is a rectangular M x N matrix of the form (1.11) with centered i.i.d. entries with variance 
EjATy p = M~^. Note that the matrix elements of H are not independent, but they are generated from the 
independent matrix elements of AT in a straightforward way. These matrices appear in statistical samples and 
were first considered by Wishart [107]. In the case when Xij are centered Gaussian, the random covariance 
matrices are called Wishart matrices or ensemble. 

1.3 Motivations: from Schrodinger operators to the (^-function 

We will primarily study the eigenvalue statistics of large random matrices and some results about eigenvectors 
will also be mentioned. The main physical motivation is that a random matrix can model the Hamilton 
operator of a disordered quantum system. The symmetry properties of H stem from this consideration: 
symmetric matrices represent Hamiltonians of systems with time reversal invariance (e.g. no magnetic field), 
hermitian matrices correspond to systems without time reversal symmetry. (There is a third class of matrices, 
the quaternion self-dual matrices, most prominently modelled by the Gaussian Symplectic Ensemble ( GSE), 
that describe systems with odd-spin and no rotational symmetry, but we will not discuss it in detail.) 

E. Wigner has originally invented random matrices to mimic the eigenvalues of the unknown Hamiltonian 
of heavy nuclei; lacking any information, he assumed that the matrix elements are i.i.d. random variables 
subject to the hermitian condition. His very bold vision was that, although such a crude approximation 
cannot predict individual energy levels (eigenvalues) of the nucleus, their statistical properties may be char- 
acteristic to some global feature shared by any nucleus. By comparing measured data of energy levels of 
nuclei with numerical calculations of eigenvalues of certain random matrices, he found that the level statistics, 
i.e., the distribution of the energy gaps between neighboring energy levels (eigenvalues), show remarkable 
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coincidence and robustness. In particular, he observed that energy levels tend to repel each other, a signifi- 
cant difference from the level statistics of fully uncorrelated random points (Poisson point process) . Similar 
feature was found for random matrices: even Wigner matrices that are "as stochastic as possible" delivered 
plots of strongly correlated (repelling) eigenvalues. This correlation is due to the underlying fundamental 
symmetry of the matrix ensemble, in particular symmetric and hermitian matrices were found to have a 
different strength of level repulsion, but within a fixed symmetry class a universal pattern emerged. For 
more details on the history of this remarkable discovery, see [70]. 

Universality of local eigenvalue statistics is believed to hold for a much broader class of matrix ensembles 
than we have introduced. There is no reason to believe that the matrix elements of the Hamiltonian of 
the heavy nuclei are indeed i.i.d. random variables. Conceivably, the matrix elements need not be fully 
independent or identically distributed for universality. There is little known about matrices with correlated 
entries, apart from the unitary invariant ensembles (Section 1.5.3) that represent a very specific correlation. 
In case of a certain class of Wigner matrices with weakly correlated entries, the semicircle law and its 
Gaussian fluctuation have been proven [83, 84]. 

Much more studied are various classes of random matrices with independent but not identically dis- 
tributed entries. The most prominent example is the tight binding Anderson model [6], i.e., a Schrodinger 
operator, — A-|-A1^, on a regular square lattice Z'^ with a random on-site potential V and disorder strength A. 
This model describes electron propagation (conductance) in an ionic lattice with a disordered environment. 
Restricted to a finite box, it can be represented by a matrix whose diagonal elements are i.i.d. random 
variables; the deterministic off-diagonal elements are given by the Laplacian. 

The general formulation of the universality conjecture for random Schrodinger operators states that 
there are two distinctive regimes depending on the energy and the disorder strength. In the strong disorder 
regime, the eigenfunctions are localized and the local spectral statistics are Poisson. In the weak disorder 
regime, the eigenfunctions are delocalizcd and the local statistics coincide with those of a Gaussian matrix 
ensemble. Separate conjectures, that will not be discussed here, relate these two regimes to chaotic vs. 
intcgrablc behavior of the underlying classical dynamical system. According to the Berry- Tabor conjecture 
[12], Poisson statistics of eigenvalues should emerge from quantizations of integrable classical dynamics, while 
random matrix theory stems from quantization of chaotic classical dynamics (Bohigas, Giannoni, Schmit [15]) 

Returning to the more concrete Anderson model, in space dimensions three or higher and for weak 
randomness, the model is conjectured to exhibit metal-insulator transition, i.e., in d > 3 dimensions the 
eigenfunctions of —A + XV are delocalized for small A, while they are localized for large A. It is a fundamental 
open mathematical question to establish this transition. 

The localization regime at large disorder or near the spectral edges has been well understood by Frohlich 
and Spencer with the multiscale technique [56, 57], and later by Aizenman and Molchanov by the fractional 
moment method [1]; many other works have since contributed to this field. In particular, it has been 
established that the local eigenvalue statistics are Poisson [73] and that the eigenfunctions are exponentially 
localized with an upper bound on the localization length that diverges as the energy parameter approaches 
the presumed phase transition point [92, 35]. 

The progress in the delocalization regime has been much slower. For the Bethe lattice, corresponding to 
the infinite-dimensional case, delocalization has been established in [66, 2, 54] (in an apparent controversy 
to the general conjectures, the eigenvalue statistics, however, are Poisson but for a well understood specific 
reason [3] ) . In finite dimensions only partial results are available. The existence of an absolutely continuous 
spectrum (i.e., extended states) has been shown for a rapidly decaying potential, corresponding to a scattering 
regime [80, 16, 25]. Diffusion has been established for a heavy quantum particle immersed in a phonon field 
in d > 4 dimensions [55]. For the original Anderson Hamiltonian with a small coupling constant A, the 
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eigenfunctions have a localization length of at least [19]. The time and space scale corresponds to 
the kinetic regime where the quantum evolution can be modelled by a linear Boltzmann equation [94, 47]. 
Beyond this time scale the dynamics is diffusive. This has been established in the scaling limit A —> up to 
time scales t ^ A"^"" with an explicit k > in [38]. 

There are no rigorous results on the local spectral statistics of the Anderson model in the disordered 
regime, but it is conjectured - and supported by numerous arguments in the physics literature, especially 
by supcrsymmetric methods (see [34]) - that the local correlation functions of the eigenvalues of the finite 
volume Anderson model follow the GOE statistics in the thermodynamic limit. GUE statistics are expected 
if an additional magnetic field breaks the time-reversal symmetry of the Anderson Hamiltonian. Based upon 
this conjecture, the local eigenvalue statistics are used to compute the phase diagram numerically. It is very 
remarkable that the random Schrodinger operator, represented by a very sparse random matrix, exhibits the 
same universality class as the full Wigner matrix, at least in a certain energy range. 

Due to their mean-field character, Wigner matrices are simpler to study than the Anderson model and 
they arc always in the delocalization regime. In this survey we mainly focus on Wigner matrices, but 
we keep in mind the original motivation from general disordered systems. In particular, we will study 
not only eigenvalue statistics but also eigenvectors that are shown to be completely delocalized [42]. The 
local spectral statistics in the bulk are universal, i.e., it follows the statistics of the corresponding Gaussian 
ensemble (GOE, GUE, GSE), depending on the symmetry type of the matrix. This topic will be the main 
goal of this presentation. 

To close this section, we mention some other possible research directions that we will not pursue here 
further. The list is incomplete. 

A natural intermediate class of ensembles between the fully stochastic Wigner matrices and the Anderson 
model with diagonal randomness is the family of random band matrices. These are hermitian or symmetric 
random matrices H with independent but not identically distributed entries. The variance of hij depends 
only on [i — j] and it becomes negligible if [i — j] exceeds a given parameter W, the band- width; for 
example, afj = E\hij\'^ ~ exp(— ]i — j\/W). It is conjectured [58] that the system is completely delocalized 
if W ^ VN, otherwise the localization length is W^. Moreover, for narrow bands, W ^ VN, the local 
eigenvalue statistics are expected to be Poisson, while for broad bands, W » Vn they should be given by 
GUE or GOE, depending on the symmetry class. Localization properties of H for W <^ iV^/® and an 0{W^) 
upper bound on the localization length have been shown by J. Schenker [82] but not local statistics. From 
the delocalization side, with A. Knowles we recently proved [36, 37] diffusion up to time scale t <ti W^^^ 
which implies that the localization length is at least l^^^+i/^. 

We mention that universality of local eigenvalue statistics is often investigated by supcrsymmetric tech- 
niques in the physics literature. These methods are extremely powerful to extract the results by saddle point 
computations, but the analysis justifying the saddle point approximation still lacks mathematical rigor. So 
far only the density of states has been investigated rigorously by using this technique [26] . Quantum diffusion 
can also be studied by supersymmetry and certain intermediate models can be rigorously analyzed [27, 28]. 

Finally, we point out that we focused on the physical motivations coming from disordered quantum sys- 
tems, but random matrices appear in many other branches in physics and mathematics. It is a fundamental 
object of nature with an extremely rich structure. The most remarkable connection is with the ^-function. 
It is conjectured that the roots of the Riemann ^-function, ({s) := X^^i'^"'*' lying on the vertical line 
9les = i, have the same local statistics as the GUE (after appropriate rescaling). A review and further 
references to many numerical evidences is found in the classical book of Mehta [70] . 



13 



1.4 Eigenvalue density and delocalization 

For symmetric or hermitian matrix i?, let Ai < A2 < • • • < denote the eigenvalues. They form a random 
point process on the real line with a distribution generated from the joint probability law of the matrix 
elements. Since the functional relation between matrix elements and eigenvalues is highly nontrivial, the 
product measure on the entries turns out to generate a complicated and highly correlated measure for the 
eigenvalues. Our main goal is to understand this induced measure. 

Under the chosen normalization (1.16), the typical size of the eigenvalues is of order one. We will prove a 
much more precise statement later, but it is instructive to have a rough feeling about the size via computing 
Tr H'^ in two ways: 

N N 

Taking expectation and using (1.16) we have 

lyEA2 = ly4=.i 

i ij 

i.e., in an average sense EA^ = 1. 



1.4.1 Wigner semicircle law and other canonical densities 

The empirical distribution of eigenvalues follows a universal pattern, the Wigner semicircle law. To formulate 
it more precisely, note that the typical spacing between neighboring eigenvalues is of order 1/A^, so in a fixed 
interval [a, b] C K, one expects macroscopically many (of order N) eigenvalues. More precisely, it can be 
shown (first proof was given by Wigner [106]) that for any fixed a < b real numbers, 

\im : X,e[a,b]} = f gsc{x)dx, g^x) := ^^{A- x^)+, (1.20) 

where {a)+ '■= max{a,0} denotes the positive part of the number a. Note the emergence of the universal 
density, the semicircle law, that is independent of the details of the distribution of the matrix elements. 

The semicircle law is characteristic for the universal Wigner matrices (see Definition 1.1). For random 
square matrices with independent entries but without symmetry (i.e., hij are independent for all i, j) a similar 
universal pattern emerges, the circular law. For example, if hij are centered i.i.d. random variables with 
common variance afj = iV^^, then the empirical density of eigenvalues converges to the uniform measure 
on the unit disk in the complex plane [99]. If independence is dropped, one can get many different density 
profiles. 

For example, in case of the random covariance matrices (1.19), the empirical density of eigenvalues A; of 
H converges to the Marchenko-Pastur law [69] in the limit when M, N —>■ 00 such that d = N/M is fixed 
< d < 1: 



lim^—#{i : A, e [a,b]} = j gMpix)dx, gMpix) — y ^ (1.21) 

with A± := (1 ± Vd)"^ being the spectral edges. Note that in case M < N, the matrix H has macroscopically 
many zero eigenvalues, otherwise the spectra of XX* and X*X coincide so the Marchenko-Pastur law can 
be applied to all nonzero eigenvalues with the role of M and N exchanged. 
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1.4.2 The moment method 



The eigenvalue density is commonly approached via the fairly robust moment method (see [4] for an expose) 
that was also the original approach of Wigner to prove the semicircle law [106]. For example, for hermitian 
Wigner matrices, it consists of computing traces of high powers of H, i.e., 

by expanding the product as 



and noticing that each factor h^y must be at paired with at least another copy hy^ = h^y, otherwise 
the expectation value is zero. The possible index sequences that satisfy this pairing conditions can be 
classified according to their complexity, and it turns out that the main contribution comes from the so-called 
backtracking paths. These arc index sequences 111213 . ■ • i2kii, returning to the original index ii, that can be 
successively generated by a substitution rule 

a-^aba, 6 e {1, 2, . . . , iV}, 67^0, 

with an arbitrary index b. These index sequences satisfy the pairing condition in an obvious manner and 
it turns out that they involve the largest possible number {N^) independent indices. The number of back- 
tracking paths is explicitly given by the Catalan numbers, Cfe = (^^*^) , so ETr H^'^ can be computed fairly 
precisely for each finite k: 

^ETi-H"" = ^ (^1^^ + 0,{N--'). (1.22) 

Note that the number of independent labels, N'^, exactly cancels the size of the fc-fold product of variances, 
(EI/ip)*"' = N"'^. If the distribution of the matrix elements is symmetric, then the traces of odd powers all 
vanish since they can never satisfy the pairing condition. Without the symmetry condition the traces of odd 
powers are non-zero but negligible. 

We will compute the trace of the resolvent, or the Stieltjes transform of the empirical density 



1 ^ 



N 

N ■ 

of the eigenvalues, i.e. we define 



m{z) = mN{z) — Tr — = T7 V T = / (1-23 



for any z = E + irj, E rj > 0. For large z one can expand m^r as follows 



m=0 



so after taking the expectation, using (1.22) and neglecting the error terms, we get 

m=0 ^ ' 
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which, after some calculus, can be identified as the power series of ^{—z + a/z^ — 4). The approximation 
becomes exact in the N ~^ oo limit. Although the expansion (1.24) is valid only for large z, given that the 
limit is an analytic function of z, one can extend the relation 



lim EmAr(z) -(-z + \/ ~ 4) 

by analytic continuation to the whole upper half plane z = E + ii], rj > 0. It is an easy exercise to see that 
this is exactly the Sticltjcs transform of the semicircle density, i.e., 

mseiz) k-z + Vz^^) = / il^^, (1.26) 

The square root function is chosen with a branch cut in the segment [—2,2] so that Vz^ — 4 ~ z at infinity. 
This guarantees that ^mmsdz) > for 3m z > 0. Since the Stieltjes transform identifies the measure 
uniquely, and pointwise convergence of Stieltjes transforms implies weak convergence of measures, we obtain 

EdQNix) ^ gsc{x)dx. (1.27) 

With slightly more efforts one can show that 



lim mw(z) = -(-z + ^/z^ - 4) (1.28) 

N^oo 2 

holds with high probability, i.e., the convergence holds also in probability not only in expectation. For more 
details, see [4]. 

1.4.3 The local semicircle law 

The moment method can typically identify the resolvent for any fixed z and thus give the semicircle law as 
a weak limit, i.e., (1.20) will hold for any fixed interval / := [a, 5] as A'^ — > oo. However, a fixed interval / 
with length |/| typically contains of order iV|/| eigenvalues. It is natural to ask whether the semicircle law 
holds locally as well, i.e., for intervals whose length may shrink with N, but still N\I\ ^ 1. Eventually, the 
semicircle law is a type of law of large numbers that should require only that the number of random objects 
in consideration goes to infinity. Due to the formula 

^ N ^ ^ N 

3mmNiz) = - Y. ix,-EY + Ti^ ^ ^Y.^-i^^- ~ z = E + ir^, 

where (5,, denotes an approximate delta function on scale 77, we see that knowing the Stieltjes transform for 
some z G C with 3m z = 77 is essentially equivalent to knowing the local density on scale 77, i.e., in an interval 
of length |/| ~ 77. 

In [39, 40, 41] we proved that the local semicircle law holds on the smallest possible scale of 77 ^ 1 /iV, i.e., 
the limit (1.20) holds even if the length of / = [a, 6] is essentially of order 1/A^, hence it typically contains 
only large but finite number of eigenvalues. This will be the key technical input for further investigations 
on local spectral statistics. There are several versions of the local semicircle law; we will give three precise 
statements: Theorem 1.9 (from [41]), Theorem 2.5 (from [49]) and Theorem 2.19 (from [50]). 
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The method of the proof is different from the moment method, but we stiU work with the resolvent, or 
the Stieltjes transform. The key observation (see also several previous works, e.g. [9, 69]) is that the Stieltjes 
transform msc{z) of the semicircle density Psc satisfies the following simple quadratic equation: 

m,,(z) + — ^— - = 0, (1.29) 

z + msc(z) 

and among the two possible solutions, mgdz) is identified as explained after (1.26). The strategy is that 
expanding the empirical Stieltjes transform mj^{z) (1.23) according to minors of i?, we prove that tojv 
satisfies the self-consistent equation (1.29) approximately and with a high probability: 

v\n[z) H ; — , - « 0. 

z + ■miy[z) 

Then we conclude the proof of mjv ~ by invoking the stability of the equation (1.29). Since the stability 
deteriorates near the edges, E = IHez sa ±2, the estimate will be weaker there, indicating that the eigenvalue 
fluctuation is larger at the edge. 

Our best results in this direction are obtained in [50] (which is partly a streamlined version of [48, 49]), 
where not only the trace of the Green function (1.23) but also individual diagonal elements were shown to 
be given by the semicircle law. The results were already listed informally in Section 1.1.1 and we pointed 
out that they hold also for universal Wigner matrices, see Theorems 2.5 and 2.19. 

For the universal Wigner ensembles Guionnet [59] and Anderson- Zeitouni [5] already proved that the 
density of the eigenvalues converges to the Wigner semi-circle law on a large scale, our result improves this 
to small scales. For example, for band matrices (for Definition see (1.18)) with band width W we obtain 
that the semicircle law holds down to energy scales 1/W. The delocalization length is shown to be at least 
as large as the band width W. We note that a certain three dimensional version of Gaussian band matrices 
was also considered by Disertori, Pinson and Spencer [26] using the supersymmetric method. They proved 
that the expectation of the density of eigenvalues is smooth and it coincides with the Wigner semicircle law 
up to a precision determined by the bandwidth. 



1.4.4 Density of eigenvalues for invariant ensembles 

There is another natural way to define probability distributions on symmetric or hcrmitian matrices apart 
from directly imposing a given probability law v on their entries. They are obtained by defining a density 
function directly on the set of matrices: 

V{H)dH := Z-^ cxp (-iVTr T/(i?))diJ. (1.30) 

Here AH = YiiKj '^^^ij is the fiat Lebesgue measure (in case of hermitian matrices and i < j, dHij is the 
Lebesgue measure on the complex plane C). The function : R — >■ M is assumed to grow mildly at infinity 
(some logarithmic growth would suffice) to ensure that the measure defined in (1.30) is finite, and Z is 
the normalization factor. Probability distributions of the form (1.30) arc called invariant ensembles since 
they arc invariant under the orthogonal or unitary conjugation (in case of symmetric or hermitian matrices, 
respectively). For example, in the hermitian case, for any fixed unitary matrix J7, the transformation 

H U*HU 



leaves the distribution (1.30) invariant thanks to TrV{U*HU) = TtV{H). 
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Wigner matrices and invariant ensembles form two different universes with quite different mathematical 
tools available for their studies. In fact, these two classes are almost disjoint, the Gaussian ensembles being 
the only invariant Wigner matrices. This is the content of the following lemma ([22] or Theorem 2.6.3 [70]). 

Lemma 1.1 Suppose that the symmetric or hermitian matrix ensembles given in (1.30) have independent 
entries hij, i < j . Then V{x) is a quadratic polynomial, V{x) — ax^ + bx + c with a > 0. This means that 
apart from a trivial shift and normalization, the ensemble is GOE or GUE. 

The density of eigenvalues of the invariant ensemble (1.30) is determined by a variational problem [22]. 
It is given by the equilibrium density of a gas with a logarithmic self-interaction and external potential V , 
i.e., as the solution of 

infj / / \og\s-t\-^Q{ds)Q{dt)+ f Vit)g{dt)}, 

where the infimum is taken over all probability measures g. Under some mild conditions on V, the equilibrium 
measure is absolutely continuous, geq{dt) = geq(t)dt and it has compact support. If is a polynomial, then 
the support consists of finitely many intervals. The empirical density of eigenvalues converges to geq in the 
sense of (1.20) where Psc is replaced with the function g^q. It is an easy exercise to check that the solution 
of this variational problem for the Gaussian case, V{x) — x'^/2, is indeed gsc- 

1.4.5 Delocalization of eigenvectors 

Apart from the statistics of the eigenvalues, one may also study the eigenvectors of a random matrix. In 
light of the universality conjecture about disordered systems explained in Section 1.3, it is a challenging 
question to test this hypothesis on the level of eigenvectors as well. Wigner matrices are mean-field models 
and from the physics intuition they are always in the delocalized regime. Of course they are still finite 
matrices, so they cannot have absolutely continuous spectrum, a standard signature for delocalization that 
people working in random Schrodinger operators are often looking for. But the delocalization of eigenvectors 
is a perfectly meaningful question for large but finite matrices as well. Surprisingly, this question was largely 
neglected both by the random matrix community and the random Schrodinger operator community within 
mathematics until T. Spencer has raised it recently. He pointed out in a lecture that in the case of the 
Gaussian ensembles, a simple invariance argument proves that the eigenvectors v £ are fully delocalized 
in the sense that their ^''-norm is ||v||4 ^ _/V~^/^ (assuming ||v||2 = 1). This is a signature of strong 
delocalization, since, on the one hand, by Schwarz inequality 

, W , ,„ 1 W 1/4 

^"^'^11-11^ -(^Ei-^i^) ^(i^EKi^) -^^^/^iMU, 

1=1 1=1 

i.e., ||v||4 > N^^^^ always holds, on the other hand this inequality is essentially saturated if all coordinates 
of the eigenvector are of approximately the same size, \vi\ ^ N^^/'^ . 

The simple invariance argument works only for the Gaussian case, where the unitary invariance is present, 
but it is a natural question to ask whether eigenvectors of Wigner ensembles are also delocalized and the 
answer is affirmative. We have proved [41, Corollary 3.2] that if v is an £^-normalized eigenvector of a 
Wigner matrix H with eigenvalue A away from the edge 

iJv = Av, Ae[-2 + K,2-K] 
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for some k > 0, then the P norm of v, for any 2 < p < oo is bounded by 

||v|lp < Q7V-^+? (1.31) 

with a very high probabihty, the set of exceptional events being subexponcntiaUy smaU in Q for large Q. A 
similar bound with a logarithmic correction holds for p = oo as well. The precise statement will be given in 
Theorems 2.21 and 2.22. It is essentially a straighforward corollary of the local semicircle law, Theorem 2.5, 
that was informally outlined in Section 1.4.3. 

Note that v is an £^-normalized eigenvector, then the size of the f^-norm of v, for p > 2, gives information 
about delocalization. Complete delocalization occurs when ||v||p < ]\j-'^/'^+'^/p since this corresponds to 
the £P-norm of the fully delocalized vector v — {N^^^^ , N^-^^^ , . . . ,N^^/^). In contrast, a fully localized 
eigenvector, v = (0, 0, . . . , 0, 1, 0, . . . 0) has norm one. 



1.5 Local statistics of eigenvalues: previous results. 

A central question concerning random matrices is the universality conjecture which states that local statistics 
of eigenvalues of large N x N square matrices H are determined by the symmetry type of the ensembles but 
arc otherwise independent of the details of the distributions. It turns out that local statistics exhibit even 
stronger universality features then the eigenvalue density. 

The terminology "local statistics" refers to observables that can distinguish among individual eigenvalues. 
For all ensembles we presented so far, we used a normalization such that the typical eigenvalues remain in 
a compact set as — > oo, in other words, the limiting density function g was compactly supported. In 
this case, the typical spacing between neighboring eigenvalues is of order A^~^. This holds in the bulk of the 
spectrum^ i.e., at a positive distance away the spectral edges. The spectral edges are characterized by the 
points where g goes to zero. For example, for the Wigner semicircle distribution, gsc, they are at ±2, for the 
Marchenko-Pastur distribution (1.21) they are at A±, and for certain invariant ensembles the support of the 
eigenvalue density might consist of several intervals i.e., it can have more than two spectral edges. 



1.5.1 Bulk universality: the sine kernel and the gap distribution 

To sec individual eigenvalues and their joint distribution in the bulk spectrum, one needs to "zoom out" 
the point process of the eigenvalues by magnifying it by a factor of N. We fix two real numbers, ai,a2 
and an energy E with g{E) > 0, and we ask the probability that there is an eigenvalue a.t E + ai / [N g{E)] 
and simultaneously there is an eigenvalue aX E + a2/[Ng{E)] (the normalization is chosen such that the 
typical number of eigenvalues between these to points is independent of E). It turns out that the answer is 
independent of the details of the ensemble and of the energy E, it depends only on the symmetry type. For 
example, for the hermitian case, it is given by 



"jthere are eigenvalues X'EE+ "'j^+^^f and \' e E + ^f^^ 



1 - 



sin 7r(ai — 02) 
7r(ai - 02) 



daida2. (1.32) 



The function on the r.h.s. is obtained from the celebrated sine kernel and it should be viewed as a two by 
two determinant of the form 



dct{Kiai-aj))l.^^, K{x) 



SinTTX 
TTX 



(1.33) 



The explicit formula for the K kernel in the symmetric case is more complicated (see [22] ) , but it is universal 
and the correlation function has the same determinantal structure. 
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Note that (1.32) contains a much more delicate information about the eigenvalues than the semicircle law 
(1.20). First, it is a local information after a magnification to a scale where individual eigenvalues matter. 
Second, it expresses a correlation among two eigenvalues. For example, due to ^ 1 as y — > 0, we see 
that the eigenvalues repel each other. 

In general, the fc-th correlation functions (or /c-point marginals) give information about the joint behavior 
of a fc-tuple of eigenvalues. Their definition is as follows: 

Definition 1.2 Let Pn{^i^ A2, . . . , Xn) be the joint symmetrized probability distribution of the eigenvalues. 
For any k > 1, the k-point correlation function is defined by 

p^^(Ai,A2,...,Afe) := / pAr(Ai,...,AA;,A/c+i,...AAr)dAA;+i ...dAw. (1-34) 

Remark. Wc usually label the eigenvalues in increasing order. For the purpose of this definition, however, 
we dropped this restriction and we consider PAr(Ai, A2, . . . , \n) to be a symmetric function of N variables, 
A = (Ai, . . . , Ajv) on R^. Alternatively, one could consider the density PAr(A) = N\piq{\) ■ 1(A <S S^^^), 
where 

:= {Ai < A2 < ... < Aat} CM^. 



The significance of the /c-point correlation functions is that they give the expectation value of observables 
(functions) O depending on /c-tuplcs of eigenvalues via the formula 

(A^ /c)! ^ f 

■il,i2....,ifc — i 

where the summation is over all distinct indices ii, Z2j • ■ • ; U: a-nd the prefactor is a normalization of the sum. 

For example, the one-point function pj^^ expresses the density, in particular, by choosing the observable 
0{x) = l{x G [a, h]) to be the characteristic function of [a, b], we have 

: A,e [a,6]} = -^0(A.)- j 0{x)p%\x)dx ^ j p^},\x)Ax. 

Therefore, the Wigner semicircle law (1.20) states that p^-* converges weakly to Qsc as N ^ 00. 

The sine kernel universality in the hcrmitian case expresses that the (weak) limit of the rescaled /c-point 
correlation function, as A — > 00, is given by the determinant of K{^x) from (1.33), i.e., 

for any fixed E, as a weak convergence of functions in the variables (ai, . . . , a^). 

Once the fc-point correlation functions are identified, it is easy to derive limit theorems for other quantities 
related to individual eigenvalues. The most interesting one is the gap distribution, i.e., the distribution of the 
difference of neighboring eigenvalues, Xj+i — Xj. Note that it apparently involves only two eigenvalues, but 
it is not expressible solely by two point correlation function, since the two eigenvalues must be consecutive. 
Nevertheless, the gap distribution can be expressed in terms of all correlation functions as follows. 
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Fix an energy E with \E\ < 2. For s > and for some A''-dependent parameter t with 1/N t 1 let 
A(.) = A.(.) := ^^^^#{1 <J<N-l: A,,, - A, < |A, - E\ < t} 



i.e., the proportion of rescaled eigenvalue differences below a threshold s in a large but still microscopic 

;in 7T{x—y 
TT{x-y) 



vicinity of an energy E. Let /Cq, be the operator acting on L^{{0, a)) with integral kernel K{x, y) — ^'"•^(^ v) 



Then for any E with < 2 and for any s > we have 

r d^ 
limEA7v(s)=/ piajAa, p(a) := — ^ dct(l - /C„), (1.36) 
Af-i-oo da^ 

where dct denotes the Fredholm determinant of the operator 1 — K,a (note that /Cq is a compact operator). 
The density function p(s) of the nearest neighbor eigenvalue spacing behaves, with a very good but not 
exact approximation (called the Wigner surmise), as p{s) ^e~^^ 1^ for the symmetric case and p(s) « 
327r-2s2e-4s'/'r foj. the hermitian case [70]. 

Note that this behavior is in sharp contrast to the level spacing statistics of the Poisson point process, 
where the corresponding density is p{s) = e^^ (after rescaling the process so that the mean distance is one). 
In particular, random matrices exhibit level repulsion whose strength depends on the symmetry class (note 
the different behavior of p{s) near s w 0). 

For the proof of (1.36), we can use the exclusion-inclusion formula to express 



<— I 

- Ngi 



xp%''\E + vi,E + V2,...,E + v,n), (1.37) 



where g ~ Qsc{E). After a change of variables, 

fNgt pNgt 



-I ^ piy gt pl\ gz 

^^(*)= ^E(-ir / dzi.../ d. 

^Ntg 7_jvot J -Not 



„,^2 -J-Ngt J-Ngt 

N\ 1 (m)/ Zi Zrn \ 
1 pj^ \ II, A Ti. A I 



m 



1 1 max I Zi — Zj I < s I 



(^g)m^A' V Np''"' NpJ 

1 ' — ' pN Qt pS nS 
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2Ntg 
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(1.38) 



m 



where the factor m comes from considering the integration sector zi < Zj, j > 2. Taking TV — >■ oo and using 
(1.35), we get 

w^ (-1)'" f\ f\ , /sin7r(a, -a,)\™ 
lim E A s = V ^ '— / da2 . . . / da™ dct ^ , 1.39 

where in the last determinant term we set ai = 0. The interchange of the limit and the summation can be 
easily justified by an alternating series argument. We note that the left hand side of (1.39) is p{a)da, 
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where p{a) is the second derivative of the Fredholm determinant dct(l — JCa) given in (1.36) (see [87] or [4] 
for more details). We thus have 

hm EAw(.s)= / p{a)da. (1.40) 
1.5.2 Edge universality: the Airy kernel 

Near the spectral edges and under a different scaling another type of universality emerges. It also has a 
determinantal form, but the kernel is given by the Airy kernel, 

^(^^y) ^K^Wiy) - Ai^(^)Ai(y) 
x-y 

where Ai(a;) is the Airy function, i.e., 

-1 



1 f 

Ai(x) = - / 



COS i^t^ + xtjdt 



which is the solution to the second order differential equation, y" — xy = 0, with vanishing boundary condition 
at a; = cx). The result, that is analogous to (1.35), at the upper spectral edge = 2 of the hcrmitian Wigncr 
matrices, is the following weak limit as iV — > od 

P^'' (2 + 2 + • • • ' 2 + ^) - det (A(a„ «,))^.r (1-41) 

Similar statement holds at the lower spectral edge, E = —2. For Wigner matrices this was first proved by 
Soshnikov [91] following the work of Sinai and Soshnikov [88] and recently a different proof was given by Tao 
and Vu [97] and by Erdos, Yau and Yin [50], see Section 1.6.6. Note the different magnification factor N"^!"^ 
that expresses the fact that near the edge the typical eigenvalue spacing is N~'^/^ . Intituively, this spacing 
is consistent with the semicircle law, since 



#{A, > 2 - £} « — / ^A-xWx = —e^'^N, 
It: J2^e -JT^ 



so we expect finitely many eigenvalues at a distance e ^ ]\f~^/^ away from the edge. Note however, that 
this argument is not rigorous, since the semicircle law (1.20) requires the test interval [a,b] to be fixed, 
independent of N. Recently we proved a strong form of the local semicircle law in [50] (see Theorem 2.19 
later) which rigorously justifies this argument. 

The largest eigenvalue Xn rnay extend above 2, but not more than by 0{N~^/^). More precisely, the 
distribution function of the largest eigenvalue is given by another universal function, the Tracy- Widom 
distribution [101] 



) = i^2,l(s) := cxp ( - / (.T - .s) • q^{x)Ax^ 



lim P Aat < 2 H :—r 

where q{s) is the solution to the Painleve II differential equation q"{s) = sq{s) + 2q'^{s) with asymptotics 
q{s) ~ Ai(s) at s = +oo as a boundary condition. One can prove that 
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as s — 7> oo, i.e., eigenvalues beyond the 0{N~'^/^) scale are superexponentially damped. Similar formula 
holds for symmetric matrices as well [102]. Note that, in particular, this result precisely identifies the 
limiting distribution of the norm of a large Wigner matrix. 

The edge universality is commonly approached via the moment method presented in Section 1.4. The 
error term in (1.22) deteriorates as k increases, but with a more careful classification and evaluation of the 
possible pairing structure, it is possible to determine the moments up to order k = 0{N^^^), see [91]. We 
just mention the simpler result 

1 2^'' 

—ETrH^'' = ^={l + o{l)) (1.42) 



as long as fc = o{N^^^). Such precision is sufficient to identify the upper spectral edge of H with a precision 
almost N~'^/^ since 

r.,. „ N ETrH^fe CN 

p{>^^ > 2 + e) < < ^37i(rT|F ^ ""^'^ 

if e > (log TV) /k ^ A^^^/s j^g r^j^^ computation (1.42) can be refined to include powers of order k ~ N'^/^ 
and identify the common distribution of the largest eigenvalues precisely [91]. We remark that the original 
work of Soshnikov assumed that the single entry distribution is symmetric and all its moments arc finite, 
this condition has been subsequently relaxed [81, 77, 98]. 

The moment method does not seem to be applicable beyond Soshnikov's scale, i.e., for k much larger 
than TV^/^. On the other hand, bulk universality would require to compute moments up to fc ~ 0{N) since 
1/k is essentially the resolution scale for which knowing the moments of order k precisely still gives some 
information. The proof of the bulk universality requires completely new methods. 

We mention a useful rule of thumb. There is a strong relation among controlling e~'*^, H'' and [H — z)~^ . 
Modulo some technicalities and logarithmic factors, the following three statements are roughly equivalent 
for any < £ <C 1: 

g-itff controlled up to times \t\ < e^^ 

H'^ can be controlled up to powers fc < e^^ 
[H — z)~^ can be controlled down to 3m 2 = 77 > £ . 
These relations follow from the standard identities 



1 f°° 

=i / e-'*(^-^'di, z = E + ir], 77 > 

-z Jo 



H - z 

(where the contour 7 encircles the spectrum of H). 



1.5.3 Invariant ensembles 

For ensembles that remain invariant under the transformations H — ^ U*HU for any unitary matrix U (or, in 
case of symmetric matrices H, for any U orthogonal matrix), the joint probability density function of all the 
N eigenvalues can be explicitly computed. These ensembles are typically given by the probability density 
(1.30). The eigenvalues are strongly correlated and they are distributed according to a Gibbs measure with a 
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long range logarithmic interaction potential (this connection was exploited first in [30]). The joint probability 
density of the eigenvalues of H can be computed explicitly: 

N 

Piv(Ai, A2, ...,\n)^ const. [](A, - A,)^ J] e'^'^U ^(^.), (1.43) 

where /? = 1 for symmetric and /3 = 2 for hermitian ensembles. In particular, for the Gaussian case, V is 
quadratic and thus the joint distribution of the GOE (/? = 1) and GUE (/3 = 2) eigenvalues is given by 

N 

p^(Ai, A2, . . . , A^) = const. W{\ - X,f J] e~i^^^f-^ . (1.44) 
It is often useful to think of this measure as a Gibbs measure of the form 

-N-HlX) ^ a 

M^^(dA)=p^^(A)dA= , H(A) := ^ y(A.) - ^ ^ log |A, - A,| (1.45) 

i=l i<j 

with the confining potential T^(A) = jX^. The proof of (1.43) is a direct (but involved) calculation; it is based 
upon a change of variable. We sketch it for the hermitian (unitary invariant) case. The key observation is 
that invariance of the measure under conjugation implies that the eigenvalues, organized in a diagonal matrix 
D = diag(Ai, A2, . . . , Aat), are independent of the eigenvectors, organized in a unitary matrix U. Writing 
H = UDU*, one obtains that V{H)dH factorizes as 

dU, 

where dU denotes the uniform (Haar) measure on the unitary group U{N) and pat is the induced density 
function on the diagonal matrices (or its entries). Thus the computation of the function pN amounts to 
computing the Jacobian of the change of variables from the matrix elements of H to the paramctrization 
coordinates in terms of eigenvalues and eigenvectors. The result is 

dH = (const.)[AAr(A)]^dAdC/, AAr(A) := ]J(A, - Aj), (1.46) 

i<j 

where (3 — 1 is the symmetric and /3 = 2 is the hermitian case, see [4] or Section 3.1-3.3 of [70] for details. 

Especially remarkable is the emerging Vandermonde determinant in (1.43) which directly comes from 
integrating out the Haar measure. Note that the symmetry type of the ensemble appears through the 
exponent /3. Only /? = 1,2 or 4 cases correspond to matrix ensembles of the form (1.30), namely, to the 
symmetric, hermitian and quaternion self-dual matrices. We will not give the precise definition of the latter 
(see, e.g. Chapter 7 of [70] or [46]), just mention that this is the natural generalization of symmetric or 
hermitian matrices to quaternion entries and they have real eigenvalues. 

Irrespective of any underlying matrix ensemble, one can nevertheless study the distribution (1.43) for 
any /3 > 0; these are called the general P- ensembles. In fact, for the Gaussian case, V{X) = A^/2, there are 
corresponding tridiagonal matrix ensembles for any /3 > 0, obtained from successive Householder transfor- 
mations, whose eigenvalue distribution is described by (1.43), see [29] for an overview. Using the tridiagonal 
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V{H)dH = e-^^'^^^^MH = PAr(Ai, A2, . . . , AAr)dAi . . . dAAr 



structure, methods from the theory of Jacobi matrices can be apphed. For example, the universahty at 
the edge eigenvalues is understood in a sense that they are shown to converge to the lowest eigenvalues 
for a one dimensional Schrodinger operator with a white noise drift and, in particular, the /3-analogue of 
the Tracy- Widom distribution has been identified in [79, 78] following the conjectures of [33]. A different 
method, the Brownian carousel representation [103], has been used to generalize the tail distribution of large 
eigenvalue gaps ("Wigner surmise") for Gaussian /3-ensembles [104]. More precisely, it has been shown that 
the probability under the distribution (1.43) with ^(A) ~ that there is no point falling into a fixed 

interval of length s (after locally rescaling the spectrum so that the typical distance is 2tt) is given by 

= («, + o(i)).-^ cxp ( _ + _ ^y^y 1 + 1 _ 3), > 0, 

as s — > CX3 (after N ^ oo limit). 

The bulk universality, i.e., the analogue of the sine-kernel behavior (1.35) for general /3-ensembles is 
unproven, even for the Gaussian case. The main difficulty is that (1.43) represents an A'^-particle system 
with a long range interaction. We can write the joint density as a Gibbs measure (1.45); we have N particles 
in a confining potential V that repel each other with a potential that has locally a logarithmic repulsion, 
but also a large (in fact increasing) long range component. Standard methods from statistical physics to 
construct and analyse Gibbs measures do not seem to apply. Although here we do not attempt to construct 
the analogue of an infinite volume Gibbs measure, we only want to compute correlation functions, but even 
this is a daunting task with standard methods unless an extra structure is found. 



1.5.4 Universality of classical invariant ensembles via orthogonal polynomials 

Much more is known about the classical invariant ensembles, i.e., the /3 = 1,2,4 cases, with a general 
potential V. For these specific values an extra mathematical structure emerges, namely the orthogonal 
polynomials with respect to the weight function e^^^^'^-' on the real line. This approach was originally 
applied by Mehta and Gaudin [70, 72] to compute the gap distribution for the Gaussian case that involved 
classical Hermite orthonormal polynomials. Dyson [32] computed the local correlation functions for a related 
ensemble (circular ensemble) that was extended to the standard Gaussian ensembles by Mehta [71]. Later 
a general method using orthogonal polynomials has been developed to tackle a very general class of unitary 
ensembles (see, e.g. [14, 22, 23, 24, 52, 70, 75] and references therein). 

For simplicity, to illustrate the connection, we will consider the hermitian case /3 = 2 with a Gaussian 
potential V{X) — A^/2 (which, by Lemma 1.1, is also a Wigner matrix ensemble, namely the GUE). To 
simplify the presentation further, for the purpose of this argument only, we rescale the eigenvalues A — >■ VNX, 
which effectively removes the factor N from the exponent in (1.43). (This pure scaling works only in the 
Gaussian case, but it is only a technical convenience to simplify formulas.) 

Let Pk{x) be the k-th orthogonal polynomal with respect to the weight function with leading 

cocfficnt 1. Let 

' ■ |ie--V4p,|| 
be the corresponding orthonormal function, i.e., 

^|:k{x)'lpe{x)dx = 5k,i- (1-47) 
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In the particular case of the Gaussian weight function, Pk is given by the Hermite polynomials 



and 



Pk{x) 



(27r)i/4(fc!)i/2 



but for the following discussion we will not need these explicit formulae. 

The key observation is that by simple properties of the Vandermonde determinant, we have 



Aa.(x) = n {x, - X.,) = dct {x'-%^, = dct {P,-i{xd)l^^^, 



l<i<j<N 



(1.48) 



exploiting that Pj (x) = x^ + . . . is a polynomial of degree j with leading coefficient equal one. Define the 
kernel 

AT-l 

KN{x,y) := ^ ipk{x)ipk{y), 

i.e., the projection kernel onto the subspace spanned by the first K orthonormal functions. Then (1.48) 
immediately implies 



N 



Pn{xi,...,xm) ^Cn dct{Pj-i{xi))^ \ J^e ^ 



det {tpj-i{xi))^.^^ = C^r det (ifAr(a;,, x^))^^.^^. 



where in the last step we used that the square of the matrix {%l!j-i{xi))^._^ is exactly {^K^ixi, Xj))"^_-^ and 
we did not follow the precise constants for simplicity. 

To compute the correlation functions, we expand the determinant: 



p^J^\xi, . . . ,Xk) =Ck,N / det (KN{xi,Xj))'^ TT dxi 

N 

=Ck.N (-ir^^/ nv^-o-)-i(^.)V'ro)- 



N 



n do.. 



i=k+l 



-Ck.N E (-l)^+^n^"^a)-i(^^)^"^«)-i(^^) 

--Ck.N 2Z [det (V^a^._i(a;j));j^iJ , 

ai<Q2<...<at 



where Sn is the permutation group on N elements and (—1)^ is the parity character of the permutation. 
In the third line we used (1.47) to perform the integrations that have set cr(j) = r(j) for all j > fc + 1 
and we denoted by {ai, 012, ... , ak} the ordering of the set {(7(1), (t(2), . . . , cr(fc)} = {'''(I), '''(2), . . . , T(fc)}. 
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Finally, using that the matrix [Kiy{xi,Xj)]^ can be written as A* A with Ay = ipi^i{xj) and using the 
Cauchy-Binet expansion formula for the determinant of a product matrix, we get 



J2 [dct(^/'a,-l(Xj))^_^.^^ 



Qi<a2<...<afc 

Apart from the constant, that can be computed, we thus proved that 



{N-ky. 



det [KNixi,Xj)] 



i.e., the correlation functions have a determinantal structure. 

In order to sec the sine-kernel (1.33) emerging, we need a basic algebraic property of the orthogonal 
polynomials, the Christoffel-Darboux formula: 



JV-l 

KN{x,y) = ^ ipj{x)iljj{y) 

3=0 



i'N{x)i!N-i{y) - 'il>N{y)il!N-i{x) 
x-y 



Furthermore, orthogonal polynomials of high degree have asymptotic behavior as iV — > cx3 
(-ir /ITT X , „,..-l/4^ (-1) 



'^27n{x) 



COS 



sm 



(\/]Vx) +o(iV-i/4), (1.49) 



for any to such that |2to — < C. The approximation is uniform for |.t| < CN^^^"^. These formulas will be 
useful if we set = in (1.35), since we rescaled the eigenvalues by a factor of -y/iV, so the relation between 
the notation of (1.35) (for fc = 2) and x, y is 



N\E 



ai 



Nq(E) 



y 



N E 



a2 



Nq{E) 



(1.50) 



For different values of E one needs somewhat different asymptotic formulae for the orthogonal polynomials. 
We can thus compute that 



KN{x,y) 



1 sin(v Nx) cos( V Ny) — sin(v Ny) cos(v Nx) sin v N{x — y) 
TT X — y tt{x — y) 



(1.51) 



Using (1.50) and that g{0) = tt ^, we have 

1 



KN{x,y) 



sin7r(ai — 02) 
7r(ai — Q!2) 



which gives (1.35) for _E = after undoing the A — > \/N\ magnification. 

The main technical input is the refined asymptotic formulae (1-49) for orthogonal polynomials. In case of 
the classical orthogonal polynomials (appearing in the standard Gaussian Wigner and Wishart ensembles) 
they are usually obtained by a Laplace asymptotics from their integral representation. For a general potential 
V the corresponding analysis is quite involved and depends on the regularity properties of V. One successful 
approach was initiated by Fokas, Its and Kitaev [52] and by P. Deift and collaborators via the Riemann- 
Hilbert method, see [22] and references therein. An alternative method was presented in [68, 67] using more 
direct methods from orthogonal polynomials. 
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There have been many refinements and improvements in this very active research area related to invariant 
ensembles as it reveals fruitful connections between random matrices, orthogonal polynomials, complex 
analysis and even combinatorics (see [22]). One common input, however, is the explicit formula (1.43) for 
the joint probability density that allows one to bring in orthogonal polynomials. We now depart from this 
topic and we will focus on ensembles when such explicit formula is not available; the most prominent example 
is the Wigner matrix. Apart from the Gaussian case, no explicit formula is available for the joint eigenvalue 
distribution. Thus the basic algebraic connection between eigenvalue ensembles and orthogonal polynomials 
is lacking and completely new methods needed to be developed. In the next section we summarize recent 
results in this direction. 

1.6 Local statistics of eigenvalues: new results 
1.6.1 Hermitian matrices with Gaussian convolutions 

The first rigorous partial result for bulk universality in the non-unitary case was given by Johansson [64], 
see also Ben Arous and Peche [11] for extending [64] to the full bulk spectrum and the recent improvement 
[65] on weakening moment conditions. The main result states that the bulk universality holds for Gaussian 
divisible hermitian ensembles, i.e., hermitian ensembles of the form 



H^Vl^H + ./eV, (1.52) 

where _ff is a hermitian Wigner matrix, V is an independent standard GUE matrix and e is a positive 
constant of order one, independent of N. 
We will often use the parametrization 

H = e-'/^H +{l-e-'y/^V. (1.53) 

If embedded in a flow, then t can be interpreted as time of an Ornstein-Uhlenheck (OU) process. This 
formalism incorporates the idea that matrices with Gaussian convolutions can be obtained as a matrix 
valued stochastic process, namely as the solution of the following stochastic differential equation: 

m = ^ d/3t - l-HtAt, Ho = H, (1.54) 

V Jv ^ 

where /3t is a hermitian matrix valued process whose diagonal matrix elements are standard real Brownian 
motions and whose off-diagonal matrix elements are standard complex Brownian motions. The distibution 
of the solution to (1.54) for any fixed t coincides with the distribution of (1.53). Note that infinite time, 
t = oo, corresponds to the GUE ensemble, so the matrices (1.53) interpolate between the Wigner matrix H 
and the GUE. This point of view will be extremely useful in the sequel as it allows us to compare Wigner 
matrices with Gaussian ones if the effect of the time evolution is under control. 

Alternatively, one can consider the density function ut of the real and imaginary parts of the matrix 
elements as being evolved by the generator of the OU process: 

dtu^^Au,, A:=i^-|^, (1.55) 

where the initial condition uo{x) is the density (with respect to the reversible Gaussian measure) of the 
distribution of the real and imaginary parts of the matrix elements of \fNH. For the diagonal elements. 



28 



an OU process with a slightly different normalization is used. The OU process (1.55) keeps the expectation 
zero and variance ^ if the initial uq has these properties. 

The joint distribution of the eigenvalues of Gaussian divisible hermitian random matrices of the form 
(1.53) still has a certain dcterminantal structure. The formula is somewhat simpler if we write 



H = Vl~e{H + aV) 



with a = ■\/e/(l — e), i.e., we use the standard Gaussian convolution 

H = H + aV (1.56) 

and then rescale at the end. Note that (1.56) can be generated by evolving the matrix elements by standard 
Brownian motions (3 up to time t = a\ i.e. by solving 

dHt^^dPt, Ho = H. (1.57) 
V N 



Moreover, to be in line with the normalization convention of [44] that follows [64], we assume that the matrix 
elements of the Wigner matrix H and the GUE matrix V have variance instead of 1/A^ as in Definition 
1.1. This means that the eigenvalues are scaled by a factor ^ compared with the convention in the previous 
sections, and the semicircle law (1.20) is modified to 27T~^y^ (1 — x'^)+. This convention applies only up to 
the end of this section. ^ 
Let y = (j/i, . . . , yisf) denote the eigenvalues of H and x = (xi, . . . , xn) denote the eigenvalues of H. 
Then we have the following representation formulae (a slight variant of these formulae were given and used 
by Johansson in [64] and they were motivated by similar formulae by Brezin and Hikami [17]): 

Lemma 1.2 [44, Proposition 3.2] Let V he a GUE matrix. For any fixed hermitian matrix H with eigen- 
values y, the density Junction of the eigenvalues x of H = H + aV is given by 

„,(x-y) •= i M^ldet(e-(^^-«'=)'/^^)'^ (158) 

with S = a'^/N and we recall that Ajv denotes the Vandermonde determinant (1.48). The m-point correlation 
functions of the eigenvalues of H = H + aV , 

PN^l{xi,---,Xm) -^^ / qsixi,X2,. . ■ ,XN]y)dx,n+l ■ ■ - dXN, 

are given by the following formula 

pg(xi,. . .,xm)= ^^^l^^' det {lC%{x,,x,;y))'^^^^, S^a'/N. (1.59) 

Here we define 



N 

^^K«;y) :=7rw — w / / dz.(e-(— )/^ - 1) n 

{2mY[v - u)S Jr fJi z - 



(1.60) 



r + z~u — Sy -. 



J {w-yj){z-yj) 



{w''-2uw-z''+2uz)/2S 

e , 
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where r ^ is an arbitrary constant. The integration curves 7 and T in the complex plane are given by 
7 = 7+ U 7_ as the union of two lines 7+ : r — > — t + iuj and 7_ : r — > r — zcj (V € for any fixed w > 
and r is r — > ir, r G M. 

Wc note that F can be shifted to any vertical hne since the integrand is an entire function in w and has a 
Gaussian decay as |3m — > cx). The constants r G M and oj > (appearing in the definition of the contour 
7) can be arbitrary and can be appropriately chosen in the contour integral estimates. 

The key step behind the proof of (1.58) is the Harish-Chandra-Itzykson-Zuber integral [63] 

/ JnuAU'B)^^ ^ (const.)— 4—^, 
JuiN) An {a.) An (h) 

where A, B are two hermitian matrices with eigenvalues a = (oi, . . . , ajv) and b = (61, ... , bjq) and the 
integration is over the unitary group U (N) with respect to the Haar measure. We note that this is the step 
where the unitary invariance (or the hermitian character of H) is crucially used; analogous simple formula 
is not known for other symmetry groups, see [18]. 

For any testfunction /, and for a fixed matrix H, we have 

J /(x)g5(x;y)dx= (const.) 1 /(x)e-^^Tr (iz-i?)^ 

where we used that V = a^^{H — H) is a GUE matrix with distribution 

ViV)dV ^e--2^^'-'^"dV. 
We set H ~ UXU* to be the diagonalization of H with X ~ diag(x), then we have, using (1.46), 
//(x)<7s(x;y)dx=(const.) / / /(x)e-i^^Tr ^^^^^^^^ 

J JR« JU(N) 



= (const.) 



U{N) 
U{N) 

V / 1,7 = 1 



/(x)e-5l^-(^^+^')A^(x)dx 



/* Ct6t ( c 3 ' I 

= (const.) / /(X) \ y-^ e-j^^.(-^+^?)A^(x)dx 

= (const.) / /(x)^44dct(e-A(--^^)^)"^. dx, 
^ Vr« Ajv(y) ^ ^'^^=^ 

which proves (1.58) (apart from the constant). This shows how the Vandermonde determinant structure 
emerges for Gaussian convolutions. The proof of the contour integral representation (1.60) from (1.58) is a 
bit more involved, see Proposition 3.2 of [44] (or Proposition 2.3 [64]) for the details. 

Once (1.60) is given, the key idea is to view it as a complex integral suited for Laplace asymptotics or 
saddle point calculation. More precisely, after some straightforward algebraic steps, it can be brought in the 
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following form (see Section 3.1 of [44]), where we already changed variables in the argument to detect the 
microscopic structure. For any fixed |m| < 1 and t = o? we find from (1.60) that 

' ^i''Uu+-^-y)=N I p. I ^/,^(^)g^(,,^)e^(/«W-/«W) (1.61) 



Nq{u) ^ V ' NQ{uy-'J J^2T:iJ^2m 
with 

fN{z) i(z2 - 2uz) + ^Y. - 



2V ' N 



I \ If 1 1 Vi 
qNVZ.w) —. r\w — r + z — u\ — — > , . 

' tiw-r)^ ' N{w~r) ^ {w-yj){z-yj) 

hN{w) := i(^e-^(»-'^)/*e(«) „ 

Notice the N factor in front of the exponent in (1.61), indicating that the main contribution to the integral 
comes from the saddle points, i.e., from z and w values where f'jq{z) — f'j^{w) ~ 0. Note that 

i.e., it is essentially given by the empirical Stieltjes transform (1.23) of the eigenvalues of H. Suppose that 
the Wigner semicircle law or, cquivalcntly, (1.28) holds, then the saddle point z^, f'j^{zN) ~ can be well 
approximated by the solution to 

^^ + 2{z-Vz^-l) =0 (1.63) 

(the formula for the Stieltjes transform slightly differs from (1.28) because of the different normalization of 
H). It is easy to check that there arc two solutions, z*, with imaginary part given by zL2tiy/l — + 0{t^) 
for small t. 

Once the saddle points are identified, the integration contours 7 and F in (1.61) can be shifted to pass 
through the saddle points from a direction where /at is real and its second derivative has the "good sign" , so 
that usual saddle point approximation holds. There are altogether four pairs of saddles (z^,w^), but only 
two give contributions to leading order. Their explicit evaluation gives the sine kernel (1.33) for ICn in the 
— > 00 limit. 

The key technical input is to justify the semicircle approximation used in passing from (1.62) to (1.63) 
near the saddle point (some estimate is needed away from the saddle as well, but those arc typically easier 
to obtain). The standard argument for the semicircle law presented in Section 1.4.2 holds for any fixed z 
with 3m z > 0, independently of especially important is that 3m z > uniformly in N. Therefore this 
argument can be used to justify (1.63) only for a fixed t > 0. Recall that t — is the variance of the 
Gaussian convolution. This was the path essentially followed by Johansson who proved [64] that sine-kernel 
universality (1.35) holds if the Wigner matrix has a Gaussian component comparable with its total size. 

One implication of the local semicircle law explained in Section 1.1.1 is that the approximation from (1.62) 
to (1.63) could be justified even for very short times t. Essentially any time of order ^ 1/A^ (with some 
logarithmic factor) are allowed, but for technical reasons we carried out the estimates only for t = A^~^+'^ 
for any e > and we showed that the contour integral (1.61) is given by the sine-kernel even for such short 
times [44]. This proved the sine-kernel universality in the form of (1.35) for any fixed E in the bulk spectrum 
for hermitian matrices with a Gaussian component of variance N~^^^ . 
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1.6.2 Gaussian convolutions for arbitrary matrices: the local relaxation flow 



The method presented in Section 1.6.1 heavily rehes on the Brezin-Hikami type formula (1.60) that is 
available only for hermitian matrices. For ensembles with other symmetries, in particular for symmetric 
Wigner matrices, a new method was necessary. In a series of papers [43, 42, 46, 48, 49, 50] we developed 
an approach based upon hydrodynamical ideas from interacting particle systems. We will present it in more 
details in Section 3, here we summarize the key points. 

The starting point is a key observation of Dyson [31] from 1962. Let H be an arbitrary fixed matrix and 
consider the solution Ht to (1.57). Recall that for each fixed t, Ht has the same distribution as iJ + ^/tV , 
where ^ is a standard GUE matrix (independent of H). Dyson noticed that the evolution of the eigenvalues 
of the flow Ht is given by a coupled system of stochastic differential equations, commonly called the Dyson 
Brownian motion (DBM in short). For convenience, we will replace the Brownian motions by OU processes 
to keep the variance constant, i.e., we will use (1.54) instead of (1.57) to generate the matrix flow. The Dyson 
Brownian motion we will use (and we will still call it DBM) is given by the following system of stochastic 
differential equations for the eigenvalues X{t) = (Ai(t), . . . , XN{t)), see, e.g. Section 4.3.1 of [4], 



where {Bi : 1 < i < N} is a collection of independent Brownian motions. The initial condition A(0) is 
given by the eigenvalues of H. The choice of parameter /? = 2 corresponds to the hermitian case, but the 
process (1.64) is well defined for any (3 > 1 and the eigenvalues do not cross due to the strong repulsion 
among them. The threshold /3 = 1 is critical for the non-crossing property. As i — ^ cx), the distribution of 
X{t) converges to the Gaussian /3-cnsemblc distribution (1.44) as the global invariant measure; for example, 
for /3 = 2, it converges to the GUE. 

Using Dyson Brownian Motion, the question of universality for Gaussian divisible ensembles can be 
translated into the question of the time needed for DBM to reach equilibrium. The time scale to approach 
the global equilibrium is of order one but, as we eventually proved in [50], the decay to the local equilibrium is 
much faster, it occurs already in time scale of order t ^ N^^. Since the local statistics of eigenvalues depend 
exclusively on the local equilibrium, this means that the local statistics of Gaussian divisible ensembles with 
a tiny Gaussian component of size N~'^^'^ are already universal. 

We remark that using the relation between Gaussian divisible ensembles and the relaxation time of DBM, 
the result of Johansson [64] can be interpreted as stating that the local statistics of GUE are reached via 
DBM after time at most of order one. Our result from [44] , explained in the previous section, indicates that 
the decay to the local equilibrium occurs already in time t ^ N~^~^' . This is, however, only a reinterpretation 
of the results since neither [64] nor [44] used hydrodynamical ideas. In particular, these proofs are valid only 
for hermitian matrices since they used some version of the Brezin-Hikami formula. 

To establish universality in full generality, we have developed a purely hydrodynamical approach based 
upon the relaxation speed of DBM. The key point in this approach is that there is no restriction on the 
symmetry type; the argument works equally for symmetric, hermitian or quaternion self-dual ensembles, 
moreover, with some obvious modifications, it also works for random covariance matrices. 

Our first paper that used hydrodynamical ideas is [43]. In this paper we extended Johansson's result 
[64] to hermitian ensembles with a Gaussian component of size t ^ A^~3/4 -^^ capitalizing on the fact that 
the local statistics of eigenvalues depend exclusively on the approach to local equilibrium which in general 




(1.64) 
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is faster than reaching global equilibrium. Unfortunately, the identification of local equilibria in [43] still 
used explicit representations of correlation functions by orthogonal polynomials (following e.g. [75]), and 
the extension to other ensembles in this way is not a simple task (sec [85] for extension of [75] to symmetric 
matrices to prove edge universality). 

To depart from using orthogonal polynomials, we introduced new hydrodynamical tools in [42] which en- 
tirely eliminated explicit formulas and it gave a unified proof for the universality of symmetric and hermitian 
Wigner matrices with a small Gaussian convolution. The size of the Gaussian component, cquivalently, the 
time needed to reach the local equilibrium in a sufficiently strong sense has increased from Af"^/* to 
with a small positive ^ but the method has become general. The result was further generalized in [46] to 
quaternion self-dual Wigner matrices and sample covariance matrices and even to generalized Wigner ma- 
trices in [48, 49] (see Definition 1.1). Finally, in [50] we showed that the local equilibrium is already reached 
after time t > N^^^^ , which is essentially optimal. More importantly, the hydrodynamical method not only 
applies to all these specific ensembles, but it also gives a conceptual interpretation that the occurrence of the 
universality is due to the relaxation to local equilibrium of the DBM. 

Our hydrodynamical approach consists of two parts. First, we have a general theorem stating that under 
certain structural and convexity conditions on the Hamiltonian Ji of the equilibrium measure of the DBM 
(see (1.45) for a special case) and under a fairly strong control on the local density of eigenvalues, the local 
equilibrium is reached within a short time t ~ N~^, ^ > 0, in the sense that the local correlation functions 
rescaled in the form of (1.35) coincide with the same correlation functions in equilibrium. By the general 
Bakry-Emery [10] criterion, the speed of convergence to global equilibrium for DBM depends on the lower 
bound on the Hessian of the Hamiltonian H, which in our case is of order one. The key idea is to speed up 
this convergence by modifying the Hamiltonian. Wc add to H an auxiliary potential of the form 



where i? ^ 1 is a parameter, depending on and 7j's are the classical location of the eigenvalues, given by 



Here g{x) is the limiting density, e.g., g{x) = gsc{x) for Wigner matrices. The Hamiltonian H := H + W 
generates a new stochastic flow of the eigenvalues, called the local relaxation flow. The equilibrium Gibbs 
measure given by Ji will be called pseudo equilibrium measure. The convergence to equilibrium for this flow 
is faster, it occurs in a time scale E? . In fact, due to the strong repulsion between eigenvalues (reflected 
by a singular repulsion potential (1.64)), the convergence is even faster for observables that depend only on 
eigenvalue differences. Then we show that the modified dynamics is, in fact, not far from the original one, 
using that the typical size of the auxiliary potential W is small. More precisely, we will need to prove that 
the eigenvalues \j lie near 7j with a precision N^^^^^'^ , i.e., that 



holds with some e > 0. This is the key input condition to our general theorem and it will be proved by a 
strong control on the local density. The exponent ^ in the time scale t ^ N^^ is essentially 2e appearing in 
the estimate (1.66). 




(1.65) 




(1.66) 
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The second part of the hydrodynamical approach is to prove the necessary input conditions on the local 
density for the general theorem, especially (1.66). This is the step where specific properties of the matrix 
ensemble come into the game. To obtain relaxation to local equilibrium on time scale t ~ N~^, ^ > 0, we 
need to locate the eigenvalues with a precision at least 7V~^/^~^, e = ^/2. To obtain the optimal relaxation 
time, t ^ JV~^, the eigenvalues need to be located essentially with a precision, similarly to [44]. Very 
crudely, the precision of locating eigenvalues corresponds to the scale rj, on which the local semicircle law 
holds, so this will be the key input to verify (1.66). The technical difficulty is that we need a fairly good 
control on the local density near the spectral edges as well, since (1.66) involves all eigenvalues. Although 
we are interested only in the bulk universality, i.e., local behavior away from the edges, we still need the 
global speed of convergence for the modified dynamics that is influenced by the eigenvalues at the edge as 
well. Recall that the control on the density near the edges becomes weaker since eigenvalues near the edge 
tend to fluctuate more. 

A good control on the local density has been developed in our previous work on Wigner matrices [39, 
40, 41], but the edge behavior was not optimal. Nevertheless, in [46] we succeeded in proving (1.66) in a 
somewhat complicated way, relying on some improvements of estimates from [39, 40, 41]. In [48], we found a 
more direct way to control the local density and prove (1.66) more efficiently and a more streamlined version 
is given in [49] which we will sketch in Section 2. The strongest result [50], to be explained in Section 2.4, 
gives (1.66) with essentially 2e = 1. 

We mention that these proofs also apply to generalized Wigner matrices where the variances satisfy 
(1.17). In this case, we prove the local semicircle law down to essentially the smallest possible energy scale 
N^^ (modulo log TV factors). This is sufficient to prove (1.66) and thus we can apply our general theorem 
and prove the bulk universality of local statistics for these matrices. A much more difficult case is the Wigner 
band matrices (1.18) where, roughly speaking, afj = if [i — j] > W for some W <^ N. In this case, we 
obtain [48, 49] the local semicircle law to the energy scale which is not strong enough to prove (1.66) 
if W is much smaller than N (the case W > N^~^ with some small 6 still works). 

1.6.3 Removing the Gaussian convolution I. The reverse heat flow 

In the previous two sections we discussed how to prove bulk universality for matrices with a small Gaussian 
convolution. The method of [44] (Section 1.6.1) required only a very small Gaussian component (variance 
N~^'^'^) but it was restricted to the hermitian case. The hydrodynamical method [50] (Section 1.6.2) works 
in general (the earlier versions [48, 49] assumed a larger Gaussian component with variance ^ N^^). Both 
methods, however, need to be complemented by a perturbative step to remove this small Gaussian component. 

There have been two independent approaches developed to remove the restriction on Gaussian divisibility. 
The first method is the reverse heat flow argument that appeared first in [42] and was streamlined in Section 
6 of [46] . The advantage of this method is that it can prove universality for a fixed energy E as formulated 
in (1.35), moreover, it is also very simple. The disadvantage is that it requires some smoothness of the 
distribution of y/Nh, the rescaled entries of the Wigner matrix. We always assume that v has the 
subexponential decay, i.e., that there are constants C, > such that for any s 

J l{\x\ > s)di'ix) < Cexp {-s^). (1.67) 

The second method, appeared slightly after the first, is the Green function comparison theorem via a 
perturbation argument with the four moment condition. The advantage of this approach is that it holds for 
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any distribution with the only condition being the subexponential decay (1.67). The disadvantage is that it 
proves universahty (1.35) only after some averaging over E. 

The four moment condition was originally introduced by Tao and Vu [96] and used later in [97, 98] in 
their study of eigenvalue perturbation which focused on the joint statistics of eigenvalues with fixed indices. 
In Section 4 we will present our approach [48] based on resolvent perturbation. Our result does not identify 
fixed eigenvalues but it is sufficiently strong to identify the local statistics and its proof is much simpler than 
in [96] (see Section 1.6.4 for more explanation). 

In this section we sketch the method of the reverse heat flow and in the next section we explain the four 
moment comparison principles. 

For simplicity of the presentation, we consider the hermitian case but we emphasize that this method 
applies to other symmetry classes as well unlike the method outlined in Section 1.6.1. Consider the OU 
process defined in (1.55) that keeps the variance ^ fixed and let 7(da;) = "f{x)dx := 7r~^/^e~'^ da; denote 
the reversible measure for this process. Let VQ^dx) = u{x)j{dx) be the initial measure of the real and 
imaginary parts of the rescaled entries of the Wigner matrix H (note that in most of this paper i' denotes 
the distribution of ^fNhij] for this discussion we introduced the notation for the common distribution of 
^/NDlthij and ^/N3mhij, i ^ j). We let the OU process (1.55) act on the matrix elements, i.e., we consider 
Ht, the solution to (1.54). For a fixed t > 0, the distribution of Ht is given by 

e-*/^H +{l-e-'y/^V, (1.68) 

where is a GUE matrix, independent of H. The distribution of the real and imaginary parts of the matrix 
elements of Ht is then given by Ut{x)j{dx), where Ut is the solution to (1.55) with initia data uq = u (strictly 
speaking, these formulas hold for the off diagonal elements, the diagonal element has twice bigger variance 
and it is subject to a slightly different OU flow). 

The main observation is that the arguments in Section 1.6.1 or Section 1.6.2 guarantee that sine kernel 
holds for any hermitian matrix that has a Gaussian component of variance N~^~^'^ or N~^, respectively (the 
method of Section 1.6.1 applies only to the hermitian case, while Section 1.6.2 works in general). Given 
a Wigner matrix H, we do not necessary have to compare H with its Gaussian convolution (1.68); it is 
sufficient to find another Wigner matrix H such that 

H^e-'/^H+{l-e-'y/^V (1.69) 

with a very high precision. In fact, H can even be chosen i-dependent. The following lemma shows that any 
Wigner matrix H with a sufficiently smooth distribution can be arbitrary well approximated by Gaussian 
divisible matrices of the form (1.69) if i ~ N^^ for some d > 0. 

We assume that the initial density is positive, u{x) > 0, and it can be written as 

2K 

u(x)=e-^("), with - ^^(1 + ^')^'" (^■''0) 

i=i 

with any K G N and with sufficiently large constants Ck- Moreover, we assume that the initial single 
entry distribution di^o = udj has a subexponential decay (1.67). The key technical lemma is the following 
approximation statement. 
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Lemma 1.3 [Proposition 6.1 [4-6]] Suppose that for some K > Q, the measure dz^o = uA'y satisfies (1-67) 
and (1.70). Then there is a small constant aK depending on K such that for any t < ax there exists a 
probability density gt with mean zero and variance i such that 



j |e*^5t -u|d7 <Ct^ (1.71) 



for some C > depending on K . 

Furthermore, let A — A®^ , F = u*^" with some n < CN"^ . Denote by Gt = .gf Then we also have 



/I 



e'^Gt - F\ d7®" < G Nh^ (1.72) 



for some C > depending on K . 

Sketch of the proof. Given u, we want to solve the equation 

At 

e'^'gt = u, 

i.e., formally gt = e^^^u. However, the operator e~"^* is like running a heat flow (with an OU drift) in 
reverse time which is typically undefined unless u is analytic. But we can define an approximate solution to 
the backward heat equation, i.e., we set 

/ , t^A^ {-tA)'^-\ 

Since A is a second order differential operator and u is sufficiently smooth, this expression is well defined, 
moreover 

^At„ /^l 4-K aK^,\ _ r^t.K\ 



This proves (1.71) and (1.72) directly follows from it. □ 

Armed with Lemma 1.3, we can prove the sine kernel universality in the form of (1.35) for any hermitian 
Wigner matrix satisfying (1.67) and (1.70) for any fixed \E\ < 2. We choose n ^ N'^ to be the number of 
independent OU processes needed to generate the flow of the matrix elements. By choosing K large enough, 
we can compare the two measures e^^Gt and F in the total variational norm; for any observable J : M" — > M 
of the matrix elements, we have 



J J(e*-^Gt - i^)d7«'" 



< llJllooCiV^i 



2,K 



In order to prove (1.35), appropriate observables J need to be chosen that depend on the matrix elements 
via the eigenvalues and they express local correlation functions. It is easy to see that for such J, its norm 
II J||oo may grow at most polynomially in N. But we can always choose K large enough to compensate for 
it with the choice t = . Since the sine kernel holds for the distribution e*^Gt with t = N~'^^'^ (Section 
1.6.1) or t = (Section 1.6.2) it will also hold for the Wigner measure F. 

For symmetric matrices the reverse heat flow argument is exactly the same, but then only Section 1.6.2 
is available to obtain universality for short time; in particular, E in (1.35) needs to be averaged. 
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1.6.4 Removing the Gaussian convolution II. The Green function comparison theorem 

Let H and H' be two Wigner ensembles such that the first four moments of the single entry distribution, v 
and v' ^ coincide: 

m,=m;-, J = 1,2,3,4 (1.73) 

where 



x-'dv^x) and m' 



x^dv'{x). 



For complex entries one has to take the collection of all j-moments, i.e., m,- represents the collection of all 



/j, x°'x^dv{x) with a + b = j. Recall that f is the distribution of vNhij. By our normalization of Wigner 
matrices, the first moment is always zero and the second moment is one, so (1.73) is really a condition on 
the third and fourth moments. 

Our main result is the following comparison theorem for the joint distribution of Green functions. Here 
we only state the result in a somewhat simplified form, a more detailed presentation will be given in Section 4. 

Theorem 1.4 (Green function comparison theorem) [48, Theorem 2.3] Consider two Wigner matri- 
ces, H and H' , with single entry distributions v and v' . Assume that (1.67) and (1-73) hold for v and v' . 



Let G(z) 
F : M'^ ^ 



. satisfies 



and G'{z) — [H' — z) ^ denote the resolvents. Fix k and suppose that the function 
sup \V^F{x)\ <N^\ < j < 5. (1.74) 



Fix small parameters k and e. Then, for sufficiently small e' there is Cg > such that for any integers 
£i, . . . ,ik and spectral parameters z™ = iJ™ ± 1 < j < im, rii ~ 1,2, ... ,k with EJ^ G [—2 + k,2 — k] and 
r]> N ' 



-l-e 



EF\ 



N' 



■Tr 



N 



■Tr 



Y[Giz^) 



E'F{G G') 



< N- 



(1.75) 



Here the shorthand notation F [G ^ G') means that we consider the same argument of F as in the first 
term in (1.75), but all G terms are replaced with G' . 

In fact, the condition (1.73) can be weakened to require that the third and fourth moment be only close; 



j = l,2, and |to3 



< N 



|m4 



-<5 



(1.76) 



with some 6 > 0. Then (1.75) still holds, but e and e' have to be sufficiently small, depending on S and cq 
will also depend on S. The precise estimate will be stated in Theorem 4.1. 

In other words, under the four moment matching condition for two Wigner ensembles, the expectations 
of traces of any combination of resolvent products coincide if the spectral parameters in the resolvents are 
not closer than rj = N~-^~'^ to the real axis. Such a small distance corresponds to spectral resolution on 
scale N^^^'^, i.e., it can identify local correlation functions of individual eigenvalues. It is an easy algebraic 
identity to express correlation functions from traces of resolvents, for example the one point correlation 
function (density) on scale rj is approximated by 

P'-N (E) - Tr G{E + ^v) ^ ^ [^Tr G{E + zt?) - ^Tr G{E - tTj) 
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and higher point correlation functions involve higher order polynomials of resolvents. Thus Theorem 1.4 
directly compares correlation functions (for the precise statement, see Theorem 4.2). We remark that taking 
traces is not essential in (1.75), a similar comparison principle works for matrix elements of the resolvents 
as well (see [48] for the precise formulation). In fact, the proof of Theorem 1.4 is a perturbation argument 
directly involving matrix elements of the resolvent. The key ingredient is a stronger form of the local 
semicircle law that directly estimates Gu and Gij, i 7^ j, and not only the normalized trace, m{z) — Gu 
(see (2.35)-(2.36) and (2.111) for the strongest result). 

A related theorem for eigenvalues was proven earlier by Tao and Vu [96]. Let Ai < A2 < . . . < Aw and 
X'l < \'2 < ■ ■ ■ < X'ff denote the eigenvalues of H and H' , respectively. The following theorem states that 
the joint distribution of any fc-tuple of eigenvalues on scale 1/N is very close to each other. 

Theorem 1.5 (Four moment theorem for eigenvalues) [96, Theorem 15] Let H and H' he two Wig- 
ner matrices and assume that (1.67) and (1.73) hold for their single entry distributions v and v' . For any 
sufficiently small positive e and e' and for any function F : M.^ — !■ M satisfying (1.74), and for any selection 
of k-tuple of indices ii, 12, . . . , ifc G [eA^, (1 — away from the edge, we have 



EF (nX,, , A^A,, , . . . , A^ A,, ) - E'^^ ( A^ A^^ , TVAJ;^ , . . . , A^ A',^ ) 



< Ar^«' (1.77) 



with some cq > 0. The condition (1.73) can be relaxed to (1.76), but cq will depend on 5. 



Note that the arguments in (1.77) are magnified by a factor A^ and F is allowed to be concentrated on a 
scale N~' , so the result is sufficiently precise to detect eigenvalue correlations on scale N~^~'^ i.e., even 
somewhat smaller than the eigenvalue spacing. Therefore Theorem 1.4 or 1.5 can prove bulk universality for 
a Wigner matrix H if another H' is found, with matching four moments, for which universality is already 
proved. In the hermitian case, the GUE matrices, or more generally the Gaussian divisible matrices (1.53) 
provide a good reference ensemble. Matching with a GUE matrix requires that the third and the fourth 
moments match, 7713 = 0, TO4 = 3. Since the location of the eigenvalues for GUE is known very precisely 
[61, 74], (1.77) can be translated into the limit of correlation functions as (1.35) even at a fixed energy E. If 
one aims only at the limiting gap distribution (1.36) instead of (1.35), then one can directly use the Gaussian 
divisible matrix (1.53) for matching. It is easy to check [20] that for any probability distribution i/ with 
mi = and m2 = 1 that is supported on at least three points, there is a distribution with an order one 
Gaussian component so that the first four moments match. Therefore H can be matched with a Gaussian 
divisible matrix for which Johansson [64] has proved universality. Using the result of [44] on the universality 
of hermitian Wigner matrices with a tiny Gaussian convolution (conclusion of Section 1.6.1), and using that 
exact moment matching (1.73) can be relaxed to (1.76), one can compare any Wigner matrix H with its very 
short time t ^ N^^^'^ Ornstein-Uhlenbeck convolution (1.53). This removes the requirements 7713 = and 
that the support has at least three points and proves universality of correlation functions for any hermitian 
Wigner matrix in the sense of (1.35) after a little averaging in E [45]. The only technical condition is the 
subexponential decay of (1.67). 

In the symmetric case, the analogue of Johansson's result is not available (unless one uses [42]), and the 
only reference ensemble is GOE. Theorem 1.5 thus implies [96] universality for symmetric Wigner matrices 
whose single entry distribution has first four moments matching with GOE in the sense of (1.76). 
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The careful reader may notice a subtle difference between the observable in (1.77) and the local correlation 
functions (1.35). While both detect the structure on scale 1/iV, in (1.77) the indices of the eigenvalues are 
fixed, while in (1.35) their location. Roughly speaking, (1.77) can answer the question, say, "where are the 
iV/2-th and the {N/2 — l)-th eigenvalue" . The local correlation function asks "what is the probability of the 
simultaneous event that there is an eigenvalue at E and another one a.t E' = E + a/N" . These questions 
can be related only if some a-priori information is known about the location of the eigenvalues. 

Prior to [96], apart from the GUE case [61, 74], for no other ensembles could the eigenvalues be located 
with a precision 7V~^+°'' for small cq, and such precision is needed to translate (1.77) into (1.35) for a 
fixed E. Using a three-moment matching version of Theorem 1.5, one can locate the eigenvalues of any 
Wigner ensembles with such precision, provided that the third moment vanishes (i.e. matches with GUE). 
Given this information, one can proceed to match the fourth moment by choosing an appropriate Gaussian 
divisible matrix. This is possible if the original distribution is supported on at least three points. This is why 
eventually (1.35) was proven in [96] under the condition that the third moment vanishes and the support 
contains at least three points. If one accepts that (1.35) will be proven after some averaging in then the 
necessary information on the location of the eigenvalues is much weaker and it can typically be obtained 
from the local semicircle law. 

In fact, tracking individual eigenvalues can be a difficult task; note that Theorem 1.5 in itself does not 
directly imply convergence of correlation functions, one still needs some information about the location of the 
i-th eigenvalue. On the other hand. Theorem 1.5 contains information about eigenvalues with fixed indices 
which was not contained in Theorem 1.4. We remark that the local semicircle law is an essential input for 
both theorems. 

The main reason why the proof of Theorem 1.4 is shorter is due to that fact that correlation functions 
can be identified from obscrvablcs involving traces of resolvents (H — z)~^ with 3m z ~ N~-^~' and these 
resolvents have an a-priori bound of order prnzj"^ < iV^"*"', so perturbation formulas involving resolvents do 
not blow up. On the other hand, the individual eigenvalues tracked by Theorem 1.5 may produce resonances 
which could render some terms even potentially infinite (we will sketch the proof of Theorem 1.5 in Section 4). 
While level repulsion is a general feature of Wigner ensembles and it strongly suppresses resonances, the 
direct proof of the level repulsion is not an easy task. In fact, the most complicated technical estimate in 
[96] is the lower tail estimate on the gap distribution (Theorem 17 of [96]). It states that for any cq > 
there is a Ci such that 



if the index i lies in the bulk {eN < i < (1 - e)N). 

1.6.5 Summary of the new results on bulk universality 

Even the expert reader may find the recent developments slightly confusing since there have been many 
papers on bulk universality of Wigner matrices under various conditions and with different methods. Their 
interrelation was not always optimally presented in the research publications since it was, and still is, a fast 
developing story. In this section we try to give some orientation to the reader for the recent literature. 

As mentioned in the introduction, the main guiding principle behind these proofs of universality of the 
local eigenvalue statistics is to compare the local statistics of a Wigner matrix with another matrix with 
some Gaussian component. More precisely, our approach consists of three main steps: 

(1) the local semicircle law; 




(1.78) 
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(2) universality for Gaussian divisible ensembles, i.e., if the probability law of matrix elements contains a 
small Gaussian component. 

(3) universality for general ensembles; approximation by a Gaussian divisible ensemble to remove the small 
Gaussian component. 

It was clear to us from the very beginning that a good local semicircle law must be the first step in 
any proof of universality. In fact, all proofs of universality rely heavily on the details of the estimates one 
can obtain for the local semicircle law. We now summarize the existing results according to these three 
components. 

Although the proof of the local semicircle law down to the shortest scale 77 ~ 1/N is rather simple now, it 
was only gradually achieved. In our first paper, [39], we gave an upper hound on the local density essentially 
down to the optimal energy scale 77 ~ {log N)/N, but the local semicircle law itself was proven only on scale 
77 ^ N~'^/'^. In the second paper, [40], we proved the local semicircle law down to the scale 77 > (log7V)^/7V, 
almost optimal but still off by a logarithmic factor. The tail probability to violate the local semicircle law 
was also far from optimal. Both defects were remedied in [41] (see Theorem 1.9 below) where, additionally, 
an optimal delocalization result for eigenvectors was also proven (Theorem 2.22). In the first paper [39] we 
assumed a strong (Gaussian) decay condition and some convexity property of the single entry distribution 
that implies concentration (either via Brascamp-Lieb or logarithmic Sobolev inequalities). These technical 
conditions were subsequently removed and the Gaussian decay condition was replaced by a subexponential 
decay. Finally, in [48] and in its improved and streamlined version in [49], we obtained a much stronger 
error estimate to the local semicircle law, see (1.4)-(1.7), but these estimates still deteriorate at the edge. 
The optimal result [50], which we call the strong local semicircle law (Theorem 2.19), holds uniformly in the 
energy parameter. 

As for Step (2), the main point is that the Gaussian component enables one to exhibit the universal 
behavior. There are two ways to implement this idea: 

(i) the contour integral representation following Johansson [64] and Bcn-Arous, Pcchc [11] but this option 
is available only for the hcrmitian case (or for the complex sample covariance case, [11]); 

(ii) the hydrodynamical approach, where a small Gaussian component (equivalently, a small time evolu- 
tion of the OU process) already drives the system to local equilibrium. This approach applies to all 
ensembles, including symmetric, hermitian, sympletic and sample covariance ensembles, and it also 
gives the conceptual interpretation that the universality arises from the Dyson Brownian motion. 

Both approaches require a good local semicircle law. Additionally, the earlier papers on the hydrody- 
namical methods, [42] and [46], also assumed the logarithmic Sobolev inequality (LSI) for the single entry 
distribution i/, essentially in order to verify (1.66) from the local semicircle law. In [49] we removed this last 
condition by using a strengthening of the local semicircle law which gave a simpler and more powerful proof 
of (1.66) (Theorem 2.7). Finally, the strong local semicircle law in [50] (Theorem 2.19) provided the optimal 
exponent 2e = 1 in (1.66). 

Summarizing the first two steps, we thus obtained bulk universality for generalized Wigner matrices 
(1.17) with a small Gaussian convolution under the sole condition of subexponential decay. This condition 
can be relaxed to a high order polynomial decay, but we have not worked out the details. Furthermore, 
although the extension of the strong local semicircle law to sample covariance matrices is straightforward, 
these details have not been carried out either (the earlier detailed proof [46] required LSI). 
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The first two steps provide a large class of matrix ensembles with universal local statistics. In Step 
(3) it remains to approximate arbitrary matrix ensembles by these matrices so that the local statistics are 
preserved. The approximation step can be done in two ways 

(i) via the reverse heat flow; 

(ii) via the Green function comparison theorem. 

The reverse heat flow argument is very simple, but it requires smoothness on the single entry distribution. 
This approach was used in [44] , [42] and [46] and this leads to universality for all ensembles mentioned under 
the smoothness condition. This smoothness condition was then removed in [48] where the Green function 
comparison theorem was first proved. Unfortunately, we still needed the LSI and universality was established 
for matrices whose distribution v is supported on at least three points. 

A stronger version of the local semicircle law was proved in [49] and all smoothness and support conditions 
on the distributions were removed. In summary, in [49] we obtained bulk universality of correlation functions 
(1.35) and gap distribution (1.36) for all classical Wigner ensembles (including the generalized Wigner 
matrices, (1.17)). The universality in (1.35) is understood after a small averaging in the energy parameter 
E. The only condition on the single entry distribution v has the subcxponcntial decay (1.67). 

The approach of Tao and Vu [96] uses a similar strategy of the three Steps (l)-(3) mentioned at the 
beginning of this section. For Step (2), the universality for hermitian Wigner matrices and complex sample 
covariance matrices were previously proved by Johansson [64] and Ben-Arous and Peche [11]. Step (3) follows 
from Tao-Vu's four moment theorem (Theorem 1.5) whose proof uses is the local semicircle law. Step (1). 
This leads to the universality for the hermitian Wigner matrices [96] and complex sample covariance matrices 
[98] satisfying the condition that the support of the distribution contains at least three points (and, if one 
aims at a fixed energy result in (1.35), then the third moment has also to vanish). For the symmetric case, the 
matching of the first four moments was required. The Tao-Vu's approach can also be applied to prove edge 
universality [97]. In [98] the subexponential decay was replaced with a condition with a sufficiently strong 
polynomial decay. The support and the third moment condition condition can be removed by combining 
[96] with the result from our approach [44] and this has led to the universality of hermitian matrices [45] 
for any distribution including the Bernoulli measure. On the other hand, even for hermitian matrices, the 
variances of the matrix elements are required to be constant in this approach. 

Historically, Tao-Vu's first paper on the universality [96] appeared shortly after the paper [44] on the 
universality of hermitian matrices. A common ingredient for both [44] and [96] is the local semicircle law and 
the eigenfunction delocalization estimates that were essentially available from [40, 41], but due to certain 
technical conditions, they were reproved in [96]. The local semicircle law for sample covariance matrices 
was first proved in [46] and a slightly different version was given in [98] with some change in the technical 
assumptions tailored to the application. 

The four moment condition first appeared in the four moment theorem by Tao-Vu [96] (Theorem 1.5) and 
it was used in the Green function comparison theorem [48] (Theorem 1.4). The four moment theorem concerns 
individual eigenvalues and thus it contains information about the eigenvalue gap distribution directly. In 
order to translate this information into correlation functions, the locations of individual eigenvalues of the 
comparison ensemble are required. The Green function comparison theorem, on the other hand, can be 
used to compare the correlation functions directly, but the information on the individual eigenvalues is 
weaker. Nevertheless, a standard exclusion-inclusion principle argument (like the one presented in (1.38)) 
concludes the universality of the gap distribution as well. Since individual eigenvalues tend to fluctuate and 
Green functions are more stable, this explains why the proof of the four moment theorem for eigenvalues is 
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quite involved but the Green function comparison theorem is very simple. Furthermore, the Green function 
comparison theorem yields not only spectral information, but information on matrix elements as well. 

1.6.6 New results on edge universality 

Recall that A^r is the largest eigenvalue of the random matrix. The probability distribution functions of Ajv 
for the classical Gaussian ensembles are identified by Tracy and Widom [101, 102] to be 

lim P(iV2/3(AAr - 2) < s) = Fb{s), (1.79) 

where the function Fp(s) can be computed in terms of Painleve equations and f3 = 1,2,4 corresponds to 
the standard classical ensembles. The distribution of Ajv is believed to be universal and independent of the 
Gaussian structure. The strong local semicircle law. Theorem 2.19, combined with a modification of the 
Green function comparison theorem, Theorem 1.4, taylored to spectral edge implies the following version of 
universality of the extreme eigenvalues: 



Theorem 1.6 (Universality of extreme eigenvalues) [50, Theorem 2.4] Suppose that we have two Nx 
N generalized Wigner matrices, i?*-") and H^^\ with matrix elements hij given by the random variables 
N^^^'^Vij and N^^^'^Wij , respectively, with Vij and Wij satisfying the subexponential decay condition (1.67) 
uniformly for any i,j. Let P'^ and denote the probability and E'^ and W the expectation with respect to 
these collections of random variables. If the first two moments of Vij and wtj are the same, i.e. 

E^w^ v,';. = E^wl^w^^ , < / + M < 2, (1.80) 

then there is an e > and S > depending on in (1.67) such that for any s € we have 

pv(7v2/3(;s^^_2) < s-N~')-N-'^ < F^{N'^^^{Xn-2) < s) < P^'iN^^^iXN -2) < s + N-')+N-^ (1.81) 

for N sufficiently large independently of s. Analogous result holds for the smallest eigenvalue Ai. 

Theorem 1.6 can be extended to finite correlation functions of extreme eigenvalues. For example, we have 
the following extension to (1.81): 

¥^(^N^/^Xn - 2) < .si - 7V-^ . . . , N^/^XN-k - 2) < s^+i - N'') - N'' 

< P-(iV2/3(^^ _ 2) < ,si, . . . ,iV2/3(A^_fc - 2) < Sk+i) (1.82) 

< P" (^N^/^Xn - 2) < si + N-^ . . . , iV2/3(Ajv-fc - 2) < s^+i + iV"^) + TV"^ 

for all k fixed and N sufficiently large. 

The edge universality for Wigner matrices was first proved via the moment method by Soshnikov [91] (see 
also the earlier work [88]) for hermitian and symmetric ensembles with symmetric single entry distributions 
i> to ensure that all odd moments vanish. By combining the moment method and Chebyshev polynomials 
[51], Sodin proved edge universality of band matrices and some special class of sparse matrices [89, 90]. 

The removal of the symmetry assumption was not straightforward. The approach of [89, 90] is restricted 
to ensembles with symmetric distributions. The symmetry assumption was partially removed in [76, 77] 
and significant progress was made in [97] which assumes only that the first three moments of two Wigner 
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ensembles are identical. In other words, the symmetry assumption was replaced by the vanishing third 
moment condition for Wigner matrices. For a special class of ensembles, the Gaussian divisible hermitian 
ensembles, edge universality was proved [65] under the sole condition that the second moment is finite. By a 
combination of methods from [65] and [97], the same result can be proven for all hermitian Wigner ensembles 
with finite second moment [65]. 

In comparison with these results. Theorem 1.6 docs not imply the edge universality of band matrices 
or sparse matrices [89, 90], but it implies in particular that, for the purpose to identify the distribution 
of the top eigenvalue for a generalized Wigner matrix, it suffices to consider generalized Wigner ensembles 
with Gaussian distribution. Since the distributions of the top eigenvalues of the Gaussian Wigner ensembles 
are given by Fp (1.79), Theorem 1.6 shows the edge universality of the standard Wigner matrices under 
the subexponential decay assumption alone. We remark that one can use Theorem 2.20 as an input in 
the approach of [65] to prove that the distributions of the top eigenvalues of generalized hermitian Wigner 
ensembles with Gaussian distributions are given by i^2- But for ensembles in a different symmetry class, 
there is no corresponding result to identify the distribution of the top eigenvalue with F/3 . 

Finally, we comment that the subexponential decay assumption in our approach, though can be weakened, 
is far from optimal for edge universality [7, 13, 77]. 



1.7 Level repulsion and Wegner estimate on very short scales 

One of our earlier local semicircle laws for Wigner matrices, 

\m(z) - msc{z)\ < «:=||£;|-2|, (1.83) 

proven in Theorem 4.1 of [43], can be turned into a direct estimate on the empirical density in the form 

\g^{E) ~ gUE)\ < n = \\E\-2\. 

Here denotes the empirical density g{x) ~ -jj J2i ^i^i ~ ^) smoothed out on a scale 77. This result asserts 
that the empirical density on scales r] ^ 0{1/N) is close to the semicircle density. On even smaller scales 
r] < 0{1/N), the empirical density fluctuates, but its average, E£i,,(i?), remains bounded uniformly in 77. 
This is a type of Wegner estimate that plays a central role in the localization theory of random Schrodinger 
operators. In particular, it says that the probability of finding at least one eigenvalue in an interval / of size 
rj — e/N is bounded by Ce uniformly in N and e < 1, i.e., no eigenvalue can stick to any energy value E. 
Furthermore, if the eigenvalues were independent (forming a Poisson process), then the probability of finding 
n = 1, 2, 3, . . . eigenvalues in / were proportional with e". For random matrices in the bulk of the spectrum 
this probability is much smaller. This phenomenon is known as level repulsion and the precise statement is 
the following: 

Theorem 1.7 [4I, Theorem 3.4 o,nd 3.5] Consider symmetric or hermitian Wigner matrices with a single 
entry distribution v that has a Gaussian decay. Suppose v is absolutely continuous with a strictly positive 
and smooth density. Let \E\ < 2 and I = [E — r//2, E + r]/2] with r/ = e/N and let Afi denote the number of 
eigenvalues in I. Then for any fixed n, 

P(\rr > .i\< i [hermitian case] , . 

r[J^i >n)<^ Cne-(n+i)/2 [symmetric case] ^^'^^^ 

uniformly in e < 1 and for all sufficiently large N . 
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The exponents are optimal as one can easily see from the Vandermonde determinant in the joint prob- 
ability density (1-43) for invariant ensembles. The sine kernel behavior (1.32) indicates level repulsion and 
even a lower bound on P(A// > n), but usually not on arbitrarily small scales since sine kernel is typically 
proven only as a weak limit (see (1.35)). 

We also mention that (1.78) (Theorem 17 from [96]) is also a certain type of level repulsion bound, 
but the exponents are not optimal and it does not hold on arbitrary small scales. However, the virtue of 
(1.78) is that it assumes no smoothness of the distribution, in particular it holds for discrete distributions 
as well. Clearly, say, for Bernoulli distribution, even the Wegner estimate, (1.84) for n = 1, cannot hold on 
superexponentially small scales e ^ 2^^ . 

Sketch of the proof. The first step of the proof is to provide an upper bound on J\fj. Let _ff '-'"^ denote the 
{N — 1) X {N — 1) minor of H after removing the fc-th row and fc-th column. Let Ai'^\ a = 1, 2, ... — 1 
denote the eigenvalues of H'^'^'^ and u'a^ denote its eigenvectors. Computing the (k, k) diagonal element of 
the resolvent [H — z)~^ we easily obtain the following expression for 'm{z) ~ mN{z) 



N -, ^ N 



N ^ H - ' N 

fc=l fc=l L 



N-l Jk) 



Q=l 

where 



(1.85) 



ei'^^^A^laW -uWp, (1.86) 
and a^'"'^ is the fc-th column of H without the diagonal clement h^k- Taking the imaginary part, and using 

Ni < CiV?7 3m m(z), z^E + i?], (1.87) 

we have 



fc=i 



It is an elementary fact that, for each fixed fc, the eigenvalues Ai < A2 < • ■ ■ < Ajv of H and the eigenvalues 
Ml M2 !i • • • !i Mtv-i of H^^") arc interlaced, meaning that 

Ai < Ml < ^2 < M2 < ••• < Mw-i < Aw (1.89) 
This can be seen by analyzing the equation for the eigenvalues A in terms of the eigenvalues /i's 



A - hu = ^ 



a—l ^ 

where Uq, is the normalized eigenvector of the minor i?^*) belonging to //q. 

The interlacing property clearly implies that the number of in / is at least Mi — 1. For each fixed fc 
the random variables : a = 1, 2, ... — 1} are almost independent and have expectation value one, 

thus the probability of the event 



{ E e'<m-i)} 



^k ^ 

a : A:,"'e7 



44 



is negligible for small 5 [41, Lemma 4.7]. On the complement of all fij; we thus have from (1.88) that 

from which it follows that Afi < CNrj with very high probability. One of the precise results of this type is: 

Lemma 1.8 [4-1, Theorem Assuming the single entry distribution v has a Gaussian decay, then for any 
interval I with \I\ > (log N)/N we have 



P{Ui > KN\I\) < Ce-V-'^'^l^l. 

We remark that the Gaussian decay condition can be weakened and a somewhat weaker result holds also for 
even shorter intervals |/| > 1/A^ (see Theorem 5.1 [41]). 



The proof of Theorem 1.7 also starts with (1.85) and (1.87). They imply 

N 



(1.90) 



with 



1 

-T 



■riia 



AT-l 



-E 



(fe) 



(fc) 



a=l (Aq - E) 



i.e., Ofc and hk are the imaginary and real part, respectively, of the reciprocal of the summands in (1.85). The 
proof of Lemma 1.8 relied only on the imaginary part, i.e., bk in (1.90) was neglected in the estimate (1.88). 
In the proof of Theorem 1.7, however, we make an essential use of bk as well. Since typically 1/7V< I Ai'^^-i;!, 
we note that a\ is much smaller than b\ if 77 ^ 1/A^ and this is the relevant regime for the Wegner estimate 
and for the level repulsion. 

Assuming a certain smoothness on the single entry distribution Av, the distribution of the variables £}a^ 
will also be smooth even if wc fix an index k and we condition on the minor H'^^\ i.e., if we fix the eigenvalues 
X^a' and the eigenvectors u^'^''. Although the random variables s}a^ — A^la'^'^^ • ui'^-'|^ are not independent for 
different a's, they are sufficiently uncorrelated so that the distribution of bk inherits some smoothness from 
a^*^) . Sufficient smoothness on the distribution of bk makes the expectation value (a^ + b\)~P/'^ finite for any 
p > 0. This will give a bound on the p-th moment on A/} which will imply (1.84). 

We present this idea for hermitian matrices and for the simplest case n = 1. From (1.90) we have 

F(7V/ > 1) < EAA/ < C{Nif)^V. 



Dropping the superscript fc = 1 and introducing the notation 

A^(A, - E) 



da = 



N^{\a-EY+l 



N^iXa-E)^+i 



we have 



P(A/'/ > 1) < Ce^E 



N-l 



N-1 



a = l 



^ CaCa) + (h - E - daS,a 



a = l 



(1.91) 
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From one version of the local semicircle law (see Theorem 1.9 below), we know that with a very high prob- 
ability, there are several eigenvalues Xa within a distance of 0{1/N) of E. Choosing four such eigenvalues, 
we can guarantee that for some index 7 

c^, c^+i > Ge, > C (1-92) 

for some positive constant C. If ^q,'s were indeed independent and distributed according to the square of 
a complex random variable Zq with a smooth and decaying density d/x(z) on the complex plane, then the 
expectation in (1.91) would be bounded by 

s^p / , , „ , — — 7-^— — , — - — : — , — ^n^^(^^+j)- (1-^3) 

EJ (C^|Z^|^ + C^ + i|z^+i|^j d^+2|z^+2r - 07+3^7+31) ]=0 

Simple calculation shows that this integral is bounded by Ce~^ assuming the lower bounds (1.92). Combining 
this bound with (1.91), we obtain (1.84) for n ~ 1. The proof for the general n goes by induction. The 
difference between the hermitian and the symmetric cases manifests itself in the fact that ^^'s are squares of 
complex or real variables, respectively. This gives different estimates for integrals of the type (1.93), resulting 
in different exponents in (1.84). q 

In this proof we used the following version of the local semicircle law: 

Theorem 1.9 [Theorem 3.1 [4-1]] Let H be an N x N hermitian or symmetric Wigner matrix with a single 
entry distribution having a Gaussian decay. Let k > and fix an energy E G [—2 + k, 2 — k]. Then there 
exist positive constants G , c, depending only on k, and a universal constant ci > such that the following 
hold: 

(i) For any 5 < cik. and N > 2 we have 

F{\m{E + iri) - msc{E + ir])\ > S) < Ce""^^ (1.94) 
for any K/{N\/E) < rj < \, where K is a large universal constant. 

(a) LetAf,^'{E) ^ Mi* denote the number of eigenvalues in the interval I* := [E — rj* /2, E + rj* /2\. Then 
for any 5 < cik there is a constant Kg, depending only on S, such that 

> (5} < Ce^"*'^^ (1.95) 

holds for all if satisfying Kg/N < 77* < 1 and for all N > 2. 



Nil* 



- Qsc{E) 



2 Local semicircle law and delocalization 

Each approach that proves bulk universality for generalized Wigner matrices requires first to analyze the 
local density of eigenvalues. The Wigner semicircle law [106] (and its analogue for Wishart matrices, the 
Marchenko-Pastur law [69]) has traditionally been among the first results established on random matrices. 
Typically, however, the empirical density is shown to converge weakly on macroscopic scales, i.e., on intervals 
that contain 0{N) eigenvalues. Based upon our results [39, 40, 41, 43, 48, 49, 50], here we show that the 
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semicircle law holds on much smaller scales as well. In Section 2.2 we follow the formalism of [41], while in 
Section 2.3 we use [48, 49]. The former formalism directly aims at the Stieltjes transform, or the trace of the 
resolvent; the latter formalism is designed to establish the semicircle law for individual diagonal elements of 
the resolvent and it also gives an estimate on the off-diagonal elements. The strongest result [50] that holds 
uniformly in the energy parameter is presented in Section 2.4. Finally, in Section 2.5, we indicate how to 
prove delocalization of eigenvectors from local semicircle law. 

2.1 Resolvent formulas 

For dcfinitcness, wc present the proof for the hermitian case, but all formulas below carry over to the other 
symmetry classes with obvious modifications. Wc first collect a few useful formulas about resolvents. Their 
proofs are elementary results from linear algebra. 

Lemma 2.1 Let A, B , C be n x n, m x n and m x m matrices. We define (m + n) x (m + n) matrix D as 

D-.^li (2.1) 



and n X n matrix D as 

Then for any 1 < i, j < n, we have 



D := A- B*C-^B. (2.2) 



= {D-% (2.3) 
for the corresponding matrix elements. |— I 



Recall that = Gij{z) denotes the matrix element of the resolvent 



H - z 



Let G(') denote the resolvent of iJ^'), which is the {N - 1) x {N - 1) minor of H obtained by removing the 
i-th row and column. Let a* — {hn, h2i, . . . h^iY be the i-th column of H , sometimes after removing one or 
more elements. We always keep the original labelling of the rows and columns, so there will be no confusion: 
if a' is multiplied by a matrix whose j-th column and row are removed, then we remove the j-th entry from 
a* as well. With similar conventions, we can define G'*^' etc. The superscript in parenthesis for resolvents 
always means "after removing the corresponding row and column" , in particular, by independece of matrix 
elements, this means that the matrix G^*-'^ , say, is independent of the i-th and j-th row and column of H . 
This helps to decouple dependencies in formulae. 
Using Lemma 2.1 for n = 1, m = — 1, we have 

= ^- = ^— rr-- (2-4) 

hu - z - a' • ^(.)_^ a' - z - a' • G^s^ 

where a' is i-th column with the i-th entry ha removed. 

For the offdiagonal elements, one has to do a two-row expansion. In this case, let a^ and s? denote the 
first and the second column of H after removing the first and second elements, i.e., hn, /121 from the first 
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column and /ii2,/i22 from the second. With the notation D = H — z, B = [a-'^,a^] and C = H^^ — z in 
Lemma 2.1 for n = 2, m ^ N — 2, wc can compute the D matrix which in this case we will can 

D=[ =: I (2.5) 

where we conveniently introduced 

i^lf ^ := h,, - zS,, - a^ • G^^^^Si^ ^, j - 1, 2. (2.6) 
Thus, from Lemma 2.1, we have, e.g. 



K. 



(12) 



'■22 

"^22 -"^ll ^ -"^12 ^^21 

and 



^(12) . . , . 

^ _ fll2 _ ^ fM2_ _ ^ ^(2)„(12) „^ 

~ " ^(12) ^(12)^(12) - -^22--(Y^ - -G22G11 . (2.8) 

■'^22 -"-11 ^12 ^21 -"-11 

In the last step we used 

- ^ (2-9) 
^11 

which is exactly the one-row expansion (2.4) applied to the minor of H after removing the second 

row/column. 

There is another set of formulas, that express how to compare resolvents of H and H'^^\ for example, for 
any i^j. 



Gu = G^> + :^. (2.10) 
This can be easily checked on a two by two matrix and its inverse: 



a b\ , 1 f d 



M = , , = — , , with A = ad -be 



dj ' A \-b a 

so checking (2.10), e.g. for i = 1, 7 = 2 boils down to the identity 

^ _ 1 ii 
A a f ■ 

For larger matrices, one just uses (2.3). Note that in formulas (2.7), (2.8) and (2.9) we already expressed all 
resolvents appearing in (2.10) in terms of the matrix elements of the two by two K^^"^^ matrix (2.5) which 
can play the role of M above. Similarly one has for any three different indices i, j, k that 

G..=Glf + ^^. (2.11) 

This identity can be checked on 3 by 3 matrices and then proved by induction in the general case. 
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2.2 Semicircle law via resolvents: Sketch of a crude method 



In this section we sketch the proof of 

Theorem 2.2 Let z ^ E + irj, 1/N <C <C 1 and k \\E\ — 2|. Let H be a Wigner matrix and 
G{z) = [H — z)~^ its resolvent and set m{z) := j^TrG{z). We assume that the single entry distribution v 
has Gaussian decay (d = 2 in (1.67)yl. Then we have the following approximation 



with a very high probability. 



We proved the local semicircle law in this form in Proposition 8.1 of [46] for sample covariance matrices 
(replacing semicircle with the Marchenko-Pastur distribution), but the same (or even easier) proof applies 
to Wigner matrices. The original proof was presented with a Gaussian decay condition, but it can easily 
be relaxed to subexponential decay, this affects only the estimate of the probability that the event (2.12) is 
violated. [For technical experts: in our previous papers, up to [46], we typically used the Hanson- Wright 
theorem [62] to estimate large deviation probabilities of quadratic forms. This gives a very good control for 
the tail, but requires Gaussian decay. In our more recent papers we use Lemma 2.12 based upon martingale 
inequalities, which requires only subexponential decay, and in fact can be relaxed to polynomial decay as 
well, but the tail probability estimate is weaker.] 

For the proof, we start with the identity (2.4) and express Gn 
where we split 

a* • G^'^a' = E^a* • G^^'a* + Z,, Z, a* • G^^'a* - E.a' • G^^^a* 

into its expectation and fluctuation, where denotes the expectation with respect to the variables in the i-th 
column/row. In particular, G*-'-* is independent of a*, so we need to compute expectations and fluctuations 
of quadratic functions. 
The expectation is easy 

E,a* . G^a^ = E, ^ ^fi^^ = ^ ^^Ji^^G^ha = ^ ^ ^i^' 

where in the last step we used that different matrix elements are independent, i.e. Kihikhn — j^Ski- The 
summations always run over all indices from 1 to N, apart from those that are explicitly excluded. 
Similarly to 

1 1 ^ 

m(z) = -TrG(^)--^Gfefe(z). 

fc=i 



we define 

and we have the following lemma to compare the trace of G and G*-'-*: 
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Lemma 2.3 For any 1 < i < N , 



C 



\m(z) - m'^'-\z)\ < ^ , T] = 3mz>0. 



(2.14) 



Proof. Let 



F{x):^j^#{X,<x}, F^^\x) 



N 



< x} 



denote the normalized counting functions of the eigenvalues. The interlacing property of the eigenvalues of 
H and ij''' (see (1.89)) in terms of these functions means that 



sup|A^i^(a;) - (A^ - 1)^^ (a;)| < 1 



Then, after integrating by parts, 



m{z) - ^1 - ^^m^'-^z) 



dF{x) 



1 \ f di^W(x) 



N 



<- 



1 
1 



NF{x) - (iV- l)FW(a;) 



da; 



(x — z)^ 
C 

< 



dx 



'N J \x- z\^ - Nri' 
and this, together with the trivial bound |to^*^| < rj"^, proves (2.14). 

Returning to (2.13), we have thus 



Gii — 



1 



where 



Summing up (2.16), we get 



—z — m{z) + VL 
Vl^ := hii - Z, + 



1 ^ 

-T 

N ^ 



N j^_^-z- m{z) + fli 

Suppose that := max^ 157^1 is small and \z + m\ > C > 0, thus an expansion is possible, i.e., 

o(n). 



1 



1 



—z — m{z) + fli —2 — m(z) 
Then we have the following self- consistent equation for m 

m+^— = o{n). 

z + m 

We recall that the Stieltjes transform of the semicircle law 

gscix)dx 



msc{z) 



(2.15) 
□ 

(2.16) 
(2.17) 

(2.18) 



(2.19) 



X — z 
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can be characterized as the only solution to the quadratic equation 



msc{z) 







(2.20) 



with 3mmsc{z) > for 3m z > 0. We can thus use the stability of the equation (2.20) to identify m, the 
solution to (2.19). The stability deterioriates near the spectral edges z ~ ±2 and we have the following 
precise result that can be proved by elementary calculus: 



Lemma 2.4 [46, Lemma 8.4] Fix z — E + ij], rj > 0, and set k 

positive imaginary part and 

1 

< s. 

CS 



\E\ — 2 . Suppose that m = m{z) has a 



z + m 



The 



\m - msc{z)\ < 



Applying this result, we obtain 



\m{z) - msc{z)\ < 



cn 



□ 



(2.21) 



We now give a rough bound on the size of fl. Clearly \hii\ < N~^^'^. If the single entry distribution has 
subcxponcntial decay, then we can guarantee that all diagonal elements simultaneously satisfy essentially 
this bound with a very high probability. Recall that (1.67) implies 



-i/2j 



< Ce 



-A/ 



P{\hu\ > M^N-' 

for each i. Choosing M = (log-/V)^+^, we have 

f(3i : > (logiV)(i+^)"iV-i/2) < C7Ve-^'°sW)'^' < Ce-('°s^)' 

which is faster than any polynomial decay. 

To estimate Zi, we compute its second moment 



(2.22) 



^ikG^kl^il — "^ihikG^^l hii 



hik'G^fJiihiii — Mihik' G^Ji, hii> 



(2.23) 



Since E/i = 0, the non-zero contributions to this sum come from index combinations when all h and h are 
paired. For pedagogical simplicity, assume that E/i2 = 0. this can be achieved, for example, if the distribution 
of the real and imaginary parts are the same. Then h factors in the above expression have to be paired in 
such a way that hik = hik' and hu = hw , i.e., k = k' , I = I' . Note that pairing hik = hn would give zero 
because the expectation is subtracted. The result is 



E\Z, 



kk\ ' 



(2.24) 



k,l^i 



k^i 
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where 7714 = E| ViV/i| is the fourth moment of the single entry distribution. The first term can be computed 
E l^^^l^ = ]^E(|G^^'n- - ]^^E^-g2 = ^amV^ (2.25) 
where |Gp = GG* and we used the identity 

To estimate this quantity, we need an upper bound on the local density on scale 77 3> For any 

interval / C M, let Afj := #{Aj e /} denote the number of eigenvalues in /. Lemma 1.8 from Section 1.7 
shows that A// ^ N\I\ with a very high probability. Using this lemma for the matrix H^'^^ with eigenvalues 
/xi,/X2, . . ■,Hn-i, wc have 



Recall that the sign < here means "up to log N factors" . 

The second term in (2.24) we can estimate by using the trivial bound jG^*^! < 77"^ and thus 

— V ir^'^P < ^_ V ir'^^i < ^— V V l"°(^^l^ < —— V ^ < J_ (o 27) 

^2Z^I kk\ -^2 Z^l kk\ - z\ - NnN ^ \X„- z\ - Nrj' ^' ' 

k^i k^i k^i a=l ' ' a=l ' ' 

where is the (normalized) eigenvector to /i^ and in the last step we used an estimate similar to (2.26). 
The estimates (2.25) and (2.27) confirm that the size of Zi is roughly 

at least in second moment sense. One can compute higher moments or use even stronger concentration 
results (if, for example, logarithmic Sobolev inequality is available), to strengthen (2.28) so that it holds in 
the sense of probability. 

Altogether (2.17), (2.22) and (2.28) give 

17< ^ ^ ^ 



/N y/Nrj N-q 

Since we are interested in 1/A'' ^ 77 < 1, we get that < {Nri)~^/^. Combining this with the stability 
bound, (2.21), we have 

Imiz) — m^r(z)\ < min < — , — -tt >, 

^ ^ ' «<=V ;i ^ ly]V^' (iV77)i/4/' 

which proves Lemma 2.2. |— I 

The remarkable feature is that the method works down to scales 77 ^ 1/N. The factor k expresses the 
fact that the estimate deterioriates near the edge. The exponents are not optimal, 77?. — 777<tc can be compared 
with a precision of order {N7])~^. This will be presented in the next section. The gain will come from the 
fact that the main error term Zi in (2.17) is fluctuating and it is possible to show [48] that its contributions 
to 771, — rrisc cancel to leading order and it is smaller than the size of Zi predicted by the variance calculation 
(this effect was first exploited in Theorem 4.1 of [43] and substantially improved in [48] and [49]). 

We emphasize that the presentation was very sketchy, many technical issues were neglected. 
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2.3 Semicircle via resolvents: refined method 
2.3.1 Statement of the theorem and consequences 

In this section we present a more refined method that can estimate matrix elements of the resolvent and it also 
applies to universal Wigncr matrices (Definition 1.1). This is also a key ingredient for the improved precision 
on the semicircle law both in terms of {Nr])-powcT and edge behavior. The main ingredient is to analyze a 
self-consistent equation for the vector of the diagonal elements of the resolvent, (Gn, 6*22, • ■ • , Gnn), instead 
of their sum which has led to (2.18). Again, for definiteness, we formulate the result for generalized hcrmitian 
Wigncr matrices only; the extension to symmetric matrices is straightforward. 

A key quantity will be the matrix of variances, E, introduced in (1.15). Recall that E is symmetric, 
doubly stochastic by (1.16), and in particular it satisfies — 1 < S < 1. Let the spectrum of E be supported 
in 

Spec(E) c [-l + ^_,l-(5+]U{l} (2.29) 
with some nonnegativc constants S± . We will always have the following spectral assumption 

1 is a simple eigenvalue ofT, and (5_ is a positive constant, independent of N . (2.30) 

For Wigncr matrices, all entries of E are identical and 6± = 1. It is easy to prove (see Lemma A.l of 
[48]) that (2.30) holds for random band matrices, sec (1.18) for the definition, with 6- > 0, depending only 
on /. For generalized Wigner matrices, i.e., Wigncr matrices with comparable variances, see (1.17) for the 
definition, it is easy to check that 

^± > C,nf > 0. 

The fact that 5+ > for generalized Wigner matrices allows better control in terms of the edge behavior of 
the estimates. This is the main reason why the statement below is different for universal Wigncr matrices 
(see (1.1)) and for generalized Wigncr matrices, (1.17). 

The precision of the local semicircle law depends on three factors. The first factor is the resolution (scale 
on which the semicircle holds), this is given by the imaginary part rj = 3m z of the spectral parameter z in 
the Stieltjes transform. The second factor is the distance to the edge, measured by k = ||i?| — 2|. The last 
factor is the size of the typical matrix elements, measured by the quantity 

M := — 5- (2.31) 

max.j afj 

called the spread of the matrix. For example, for Wigner matrices M — N, for generalized Wigner matrices, 
(1.1), M ^ N and for random band matrices (1.18) we have M ~ W . 

Theorem 2.5 (Local semicircle lav^r for universal Wigner matrices) [49, Theorem 2.1] Let H he a 

hermitian N x N random matrix with Mhij = 0, I < i, j < N , and assume that the variances af^ satisfy 
(1.16) and (2.30). Suppose that the distributions of the matrix elements have a uniformly subexponential 
decay in the sense that there exist constants C , t9 > 0, independent of N , such that for any a; > and for 
each (i, j) we have 

f{\h,j\ > x|CTy|) < Cexp ( - x*). (2.32) 

We consider universal Wigner matrices and its special class, the generalized Wigner matrices in parallel. 
The parameter A will distinguish between the two cases; we set A = 2 for universal Wigner matrices, and 
A = 1 for generalized Wigner matrices, where the results will be stronger. 
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Define the following domain in C 

D := ^^z = E + iri eC : \E\< 5, <jj <10, y/Ih] > {log Nf' {k + 7])^"^^ (2.33) 

where k := | \E\ — 2|. Then there exist constants Ci, Ci, and c > 0, depending only on •& and 6- in (2.30), 
such that for any e > and K > the Stieltjes transform of the empirical eigenvalue distribution of H 
satisfies 

P (U {|,„(,) - > s^^)) < ^ (2.34) 

for sufficiently large N. The diagonal matrix elements of the Green function Gii(z) = {H — z)^^(i,i) satisfy 



P( y |max|G,,(z)-m,,(z)|>^i^|§£(Ac + r;)^-4U < c7V-^0°gi°g^), ^2.35) 



\z6-D k 

and for the off- diagonal elements we have 




u 



niax|G,,(z)| > («; + 7y)3 \ \ < C7V-^(i°s'°sW) (2.36) 



\z£D 

for any sufficiently large N . 



Remark 1. These estimates are optimal in the power of M?7, but they are not optimal as far as the edge 
behavior (power of k) and the control on the probability is concerned. Under stronger decay assumptions on 
the single entry distributions it is possible to get subexponential bounds on the probability, e.g. Theorem 
1.9 (see Theorem 3.1 [41]). On the other hand, the subexponential decay condition (2.32) can be easily 
weakened if we are not aiming at error estimates faster than any power law of N . 

Remark 2. Concerning the edge behavior, we remark that our first two papers [39, 40] we simply assumed 
K > Ko for some positive constant kq- The edge behavior was effectively treated first in [41] and substantially 
improved in Theorem 4.1 of [43], but the bounds were not optimal. The best result for universal Wigner 
matrices is Theorem 2.5. For generalized Wigner matrices. Theorem 2.19 (proved in [50]) gives an optimal 
estimate uniform in n. 

The local semicircle estimates imply that the empirical counting function of the eigenvalues is close to the 
semicircle counting function (Theorem 2.6) and that the location of the eigenvalues are close to their classical 
location in mean square deviation sense (Theorem 2.7). This latter result will be used to verify (1.66), or 
more precisely Assumption III (3.9) later, that will be the key input to our hydrodynamical approach for 
universality. 

To formulate these statements precisely, let Ai < A2 < . . . < Aat be the ordered eigenvalues of a universal 
Wigner matrix. We define the normalized empirical counting function by 

n{E) := 1#{A, < E} (2.37) 



and the averaged counting function by 



7i(i?) = lE#[A, <i?]. (2.38) 
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Finally, let 



{E) := / Qsc{x)Ax (2.39) 



be the distribution function of the semicircle law which is very close to the counting function of the 7's, 
jf^ilj 1^ E] fa nsc{E). Recall that 7j's are the classical location of the eigenvalues, determined by the 
semicircle law, see (1.65). 

With these notations, we have the following theorems: 

Theorem 2.6 [49, Theorem 6.3] Let A — 2 for universal Wigner matrices, satisfying (1.16), (2.31) and 
(2.32) withM > (log A^)24+6". For generalized Wigner matrices, satisfying (1.16), (2.31), (1.17) and (2.32), 
we set A = 2 and recall M = N in this case. Then for any e > and K > 1 there exists a constant C(e, K) 
such that 

CN^\ ^ C{e,K) 



-{ sup \niE)~nUE)\[nE]^ < -—} > 1 



where the n{E) and nsc{E) were defined in (2.37) and (2.39) and he = \ \E\ — 2|. 

Theorem 2.7 [4-9, Theorem 7.1] Let H be a generalized Wigner matrix with subexponential decay, i.e., 
assume that (1.16), (2.31), (1.17) and (2.32) hold. Let Xj denote the eigenvalues of H and jj be their 
semiclassical location, defined by (1.65). Then for any e < 1/7 and for any X > 1 there exists a constant 
Ck such that 

N „ 

^{Y.\^J-l^'<N''}>^-j^■ (2-40) 

and 

N 

5^E|A,-7,f <CAr-^ (2.41) 

These theorems are consequences of the local semicircle law, Theorem 2.5. We will not give the detailed 
proof, but we mention a useful formula that allows one to translate Stieltjes transforms to densities. In this 
context this formula first appeared in [43] . 

Lemma 2.8 [21, Helffer-Sjdstrand formula] Let f be a real valued function on R. Let xiu) f>e a smooth 
cutoff function with support in [—1, 1], with xiu) — 1 for \y\ < 1/2 and with bounded derivatives. Let 

f{x + iy) := {f{x)+iyfix))x{y), 

then 

/(A) = 1- I M(:i±Md.dy = ^ / -^rWxfa)+^(/(^) + ^^/'(^))x'(.),^,^. (2.42) 

27r Jr2 \ - x~ ly 27r Jjj,2 A - a; - ly 

Wc will apply this lemma in the following form. Let g G i^(M) be a real function and let m{z) be its 
Stieltjes transform 

g(A)dA 



m{z) 



A- 
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Since / is real, we have 



f{X)gidX) = Re / f{X)g{dX) 
1 



-Re 



dzf{x + iy)m{x + iy)dxdy 



< 



2tt 

^ / 2//"(a;)x(y)Im"i(x + iy)dxdj/ 



+ C / {\f{x)\ + \y\\f{x)\)\x'{y)\\m{x + zy)\dxdy. 



(2.43) 



In order to get counting function, we will choose 

/(A) = fE,,W, 

where fE,ri is the characteristic function of the semi-axis {—oo, E), smoothed out on a scale rj (i.e., fE,ri{X) = 1 
for A < i - ?7, fE,^{X) = OfoT X>E + r], and |/'| < Cri-\ \ f"\ < Ct]-^ in the interval [E-r],E + 77]). 

The second term in (2.43) is typically harmless, since x' is supported at \y\ > 1/2, i.e. this term requires 
information on the Stieltjes transform far away from the real axis. In the first term, we have only the 
imaginary part of the Stieltjes transform and it is easy to see that 

,am^(. + .,)</|,(.)|dx. 

One can perform and integration by parts bringing the second derivative of / down to the first derivative, 
which, after integration is under control even if j] is very small. 

Using these ideas for the measure g being the difference of the empirial density and gsc, one can control 
/ f{X)g{dX), i.e. essentially the difference of the counting functions, in terms of the size of m — nisc- For the 
details, see [43] and [49]. 

2.3.2 Sketch of the proof of the semicircle law for matrix elements 

For pedagogical reasons we will neglect the edge problem for the presentation, i.e., we will assume that 
E = $He z always satisfies k = | [i?! — 2| > kq for some fixed kq > and we do not follow the dependence of 
the constants on kq. Wc will thus prove the following partial version of Theorem 2.5: 

Theorem 2.9 Assume the conditions of Theorem 2.5. For some kq > 0, define the following domain in C 

D:^D,,^{z^E + i7jeC : \z\<Q, r]>0, \\E\~2\>ko, > (log iV)^\ } (2.44) 

where Ci and Q are sufficiently large. Then there exist constants Ci, C2, C and c > 0, depending only on 
'd, Kq and S- in (2.30), such that the diagonal matrix elements of the Green function Gii{z) satisfy 



[J I n\a.-yi\Gu{z) - msc{z)\ > 
.zeD [ 



(logiV)^^ 



(2.45) 
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and for the off-diagonal elements we have 



I J J max |G,, (z)| > I ] < C7V-^(i°g'°s JV) (2.46) 



for any sufficiently large N . 



We start with a system of self-consistent equations for the diagonal matrix elements of the resolvent. The 
following lemma is a simple combination of the resolvent identities from Section 2.1. 

Lemma 2.10 The diagonal resolvent matrix elements Gu ~ {H — z)^^{i,i) satisfy the following system of 
self- consistent equations 

= z V a^G +T ' ^^-^^^ 



where 
with 



Tj := A, + hu - Z,. (2.48) 
A.,:=alGu + Y.4^^ (2-49) 



G, 



:= - E,Z«, Z« := a' • G«a^ = ^ a|,G«a?. (2.50) 



Proof. Introduce 
and recall from (2.6) 



We can write Gu as follows (2.9) 

Gu = (K^r' = \^r— T-y (2.51) 

where Ejji = denotes the expectation with respect to the elements in the i-th column of the matrix H . 
Using the fact that G'-*' ~ {H^"^^ — z)~^ is independent of a' and Ejjia^aJ ~ Skicrfj^, we obtain 

and 

~E^K^ ^hu- Z,. (2.52) 

Use 

-,('0 , GkiGii 



Gki ~ G^i + 



Gi. 
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from (2.10) and the notation from (2.49) to express 

Combining this with (2.52), from (2.51) wc cventuaUy obtain (2.47). q 
Introduce the notations 

i i i 

We will estimate the following key quantities 

Ad := max Itifcl = max |Gfcfc - TOscI, Aq := max |Gfc^ |, (2.53) 

k k k^i 

where the subscripts refer to "diagonal" and "offdiagonal" matrix elements. All the quantities defined so far 
depend on the spectral parameter z = E + ii], but we will mostly omit this fact from the notation. The real 
part E will always be kept fixed. For the imaginary part we will use a continuity argument at the end of 
the proof and then the dependence of A^^o on z will be indicated. 

Both quantities A^ and Aq will be typically small for z d D; eventually we will prove that their size is 
less than {Mri)~-^/'^, modulo logarithmic corrections. We thus define the exceptional event 



= nA{z) := {a4z) + Aoiz) > (logTV)-^} 



with some C (in this presentation, we will not care about matching all exponents). We will always work in 
ri^, and, in particular, we will have 

Arf(z) + Ao(z) < 1. 
It is easy to check from the explicit formula on m^c that 

c<\m,ciz)\<C, zeD. (2.54) 

with some positive constants, so from Gu = msc{z) + O(Arf) we have 

c<\Guiz)\<C, zeD. (2.55) 

Recalling (2.11) 



_ ^ GkiGil 
Gi 

together with (2.55), it implies that for any i and z Cz D 



max I I < A„ + GA^ < GA, in fi^, 

k^l 

c<\G^kk\< C*' foi' all k 7^ i and in n% (2.56) 
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1^12 - msc\ < Arf + CAl for all k ^ i and in ni (2.57) 

and (see (2.49)) 

\A,\<^+CAl inni. (2.58) 
Similarly, with one more expansion step and still for z € D, we get 



max max | G|,f ^ | < CAo , max max | G|,'^'^ | < C in 

ij k^l ij k 

and 

\Gkk -msc\ < Ad + CAl for all fc 7^ i,j and in 175^. (2.59) 

Using these estimates, the following lemma shows that Zi and ^i^"'' are small assuming + Aq is small 
and the /i^ 's are not too large. These bounds hold uniformly in D. 

Lemma 2.11 Define the exceptional events 

I l<z,j<A' 

(logiV)'^' 



fld{z) := < max \Z,{z)\ > 



c 



noiz) := <imax|Z^;-'^(z)| > 



(2.60) 



anrf we let 

n :^niu[J \^{ndiz)unoiz)) nniiz) (2.6I) 

zeD 

to be the set of all exceptional events. Then we have 

V{Q) < CiV-'="°siogW)^ (2.62) 

Proof. Under the assumption of (2.32), analogously to (2.22), we have 

P(Oi) < CiV-'='°siogW^ (2.63) 

so we can work in the complement set D,1 and assume that 

max|/i,„-| < (^"g^^)^ . (2.64) 
We now prove that for any fixed z € Z?, we have 

P(r!l(z)n |max|Z,(z)| > -^i^^^j) < CiV-'=iogi°gJV (2.65) 
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and 



(f^l(z)n jmaxlzf (z)| > ^^^C}) < CiV^'-'-^ 



(2.66) 



Recall that Z*-*^' is a quadratic form in the components of the random vectors a* and a-' . For such functions, 
wc have the following general large deviation result. The proof relies on the Burkholder martingale inequality 
and it will be given in Appendix A. 

Lemma 2.12 [48, Lemma B.l, B.2] Let a-i (1 < i < N) be N independent random complex variables with 
mean zero, variance and having the uniform subexponential decay 



for some positive a and for all x. Let Aj, B^j G C (I < i, j < N ). Then we have that 



N 



i=l 

'^a.iBuai - ^ a^B. 



<CN~ 



i=l 



y^ajBijaj 



>(iogiV)i+v(^|A,p)'^'| 

> (l0giV)i+2V2(^|B,,|2) [ <CN-'°SiogN^ 

> (logiV)3+2v2(^|i?,,f I <C7V-'°si°g^. 



(2.67) 

(2.68) 
(2.69) 

(2.70) 



To see (2.65), we apply the estimate (2.69) and we obtain that 



\Z,\ < (log7V)^( I'^'^kG^^'^H 



2 N 1/2 



(2.71) 



holds with a probability larger than 1 — CN '=(i°gi°s^) for sufficiently large N. 

Denote by u"a and Aq'' (a = 1, 2, . . . , — 1) the eigenvectors and eigenvalues of H^^\ Let Ua\k) denote 
the A;-th coordinate of wi'^ Then, using erf; < 1/M, (2.57) and (2.56), we have 



S;^i:-?.(|G'-'P) 



kk 



rF E "^i/^ E 



M 
C 



< 



k=ti 



-E- 



k=ti 



2 Im G^^^jz) 

k 

77 



< in . 

Mr] ^ 



(2.72) 



Here wc defined |Ap := A* A for any matrix A. Together with (2.71) we have proved (2.65) for a fixed z. 
The offdiagonal estimate (2.66), for i ^ j, is proven similarly, using (2.70) instead of (2.69). 
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We thus proved that for each fixed z G D, the sets ^d{z), i^o{z) have a very sraall probabihty. In order 
to prove (2.62), in principle, we have to take the union of uncountable many events. But it is easy to see 
that the quantities Zi{z) and Z^j''\z) defining these events are Lipschitz continuous functions in z with a 
Lipschitz constant r/"^ < N. Thus controlling them on a sufficiently dense but finite net of points will control 
them for every z € D. The number of necessary points is only a power of N while the probability bounds 
are smaller than any inverse power of N . This proves Lemma 2.11. |— I 

The key step in the proof of Theorem 2.9 is the following lemma, which says that if A is somewhat small, 
like (log A^)"*-^ then the estimate can be boosted to show that it is much smaller, like {Mr])~^/^. 

Lemma 2.13 Recall Ad, Ao and Q defined in (2.53) and (2.61) and recall the set D from (2.44). Then for 
any z €z D and in the event Q'^ , we have the following implication: if 

A<,(z) + Arf(z) < (logiV)-^ (2.73) 

then 

Ao{z) + Ad{z)<^^^^^. (2.74) 

Proof of Lemma 2.13. Choosing Ci in (2.44) sufficiently large, we can ensure from Lemma 2.11 that 

Zff ' < 1, < 1 in 17^ 

We first estimate the offdiagonal term Gij. Under the condition (2.73), from (2.8), (2.55) and (2.56) we have 

\G^A - |G,.||Gg||4^)| < C + , 
By Lemma 2.11 and under the condition (2.73) we have 

This proves the estimate (2.74) for the summand Aq. 

Now we estimate the diagonal terms. Recalling = Ai + ha — Zi from (2.48), with (2.58), (2.75) and 
Lemma 2.11 we have, 

T max Ti < G ■;= < <^ 1 m iz . (2.76 

(in the last step we used z £ D and that Ci is large). From (2.47) we have the identity 

Gii - rUsc = 7 - rUsc- (2.77) 

Using that (msc + z) = —m,J^, the fact that \msc + > 1 and that A^ + T <C 1, we can expand (2.77) as 



Vi = m 



' ' E 4^^- - + o( E 4^^- "4( E 4^^- - + oii^d + ■ (2.78) 
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Summing up this formula for all i and recalling the definition [v] := jj Vi = m — rUsc yield 



N 



Introducing the notations ( := m1^{z), [T] := T.; for simplicity, we have (using < 1 and |C| < 1) 

(1 - C)M = -cm + O ((Arf + T)2) = O (A2 + T) . (2.79) 

Recall that S denotes the matrix of covariances, S,,, = cr|j, and we know that 1 is a simple eigenvalue of 
E with the constant vector e ~ iV^^/^(l, 1, . . . , 1) as the eigenvector. Let Q := I — |e)(e| be the projection 
onto the orthogonal complement of e, note that S and Q commute. Let || • ||oo^oo denote the £°° £°° 
matrix norm. With these notations we can combine (2.79) and (2.78) to get 

v^ - H = CE ^'J-^^J- - H) - c(t,, - [T]) + O (A2 + T) . 



When summing up for all i, all three explicit terms sum up to zero, hence so does the error term. Therefore 
Q acts as identity on the vector of error terms, and we have 



c 



o 



CQ 



i-CS 



(A? + T) 



(2.80) 



CQ 



111 — (^Yj 1 1 OO— 

Combining (2.79) with (2.80), we have 



0(A2 + T) 



A, 



max If ; I < C 



CQ 



oo— >oo 



(A^ + T). 



(2.81) 



1-CS 

From this calculation it is clear how the estimates deteriorate at the edge. It is easy to check that 
1 - C = 1 - ™L(^) ^ Vk + V, z^E + ii], K=||£;|-2|. 

However, as we remarked at the beginning of the proof, we assume k > kq > for simplicity, so we will not 
follow this deterioration. 

To estimate the norm of the resolvent, we recall the following elementary lemma: 

Lemma 2.14 [48, Lemma 5.3] Let > be a given constant. Then there exist small real numbers t > 
and ci > 0, depending only on (5_, such that for any positive number 5j^., we have 



max 



r + xm2^(z) \ <{l~Ciq{z)){l + Tf 



with 



q{z) := max{(5+, |1 — Vie ml^{z)\} . 



(2.82) 

(2.83) 
□ 
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Lemma 2.15 Suppose that S satisfies (2.29), i.e., Spec{QT,) C [—1 + 1 — 5+]. The 



have 



Q 



1 - to2^(z)I] 



< 



C((5-)logiV 



(2.84) 



with some constant C(5_) depending on S- and with q defined in (2.83) 



From this lemma one can see that the deterioration of the estimates at the edge caused by the first term 
in the right hand side of (2.81) can be offset by assuming S-^- > 0. 

Proof of Lemma 2.15: Let || • || denote the usual i"^ — > l"^ matrix norm and recall C, = m^^(z). Rewrite 

Q 



i-CS 

with T given in (2.82). By (2.82), we have 



1+T 



1 - 



1+r 



1+T 



■Q 



< sup 

2:e[-l+<5_,l-5+] 



(X + T 



1+T 



<(l-cig(z))i/2. 



To estimate the (°° 

CS + T 



norm of this matrix, recall that |C| < 1 and J^j l^ul = 12 j ^Ij ~ 1- Thus we have 



1+T 



,^|/CS + T 



+ T / ij 



< 



1 



max 

1 + T I 



3 3 

To see (2.84), we can expand, up to an arbitrary threshold ng, 



^^ vt5.,A < < 1. 

^3 - 1 + T - 



1 



1 _ CS+r 
1+r 



< 



^ II 1+T 



E 



CS + TV" 



ri<no 



n>7io 



CS + Tyi 

1+T 



1 + T / I 

= no + V7V ^ (l-Ciq(z))"/2 



n>nQ 



Choosing no = C\ogN/q{z), we have proved Lemma 2.15. 



□ 



We now return to the proof of Lemma 2.13. Recall that we are in the set il'^ and k > kq, i.e., l — ^Rc■ml^{z) 
and thus q{z) are separated away from zero. First, inserting (2.84) into (2.81), we obtain 

Ad < C{Al + T)logN. 

By the assumption (2.73), we have CA^logA^ < 1/2, so the quadratic term can be absorbed in the linear 
term on the right hand size, and we get 

Ad < CTlogiV. 

Using the bound (2.76) on T, we obtain 



Ad < 



(2.85) 
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which, together with (2.75), completes the proof of (2.74) and thus Lemma 2.13 is proven. q 
Proof of Theorem 2. 9. Introducing the functions 

i?(.):=(logiV)- Si^):=^^^, 

Lemma 2.13 states that, in the event 51^, if Arf(z) + Ao(z) < R{z) holds for some z E D, then Ad{z) + 
^oiz) < S{z). By assumption (2.44) of Theorem 2.9, we have S{z) < R{z) for any z £ D. Clearly, 
Arf(z) + Ao(z) < 3/77 < 3/10 for 3m z = 10. Using this information, one can mimic the proof of Lemma 
2.11 and Lemma 2.13 to get an apriori bound Ad{z) + Ao{z) < R{z) for 77 = 10. Using that R{z), S{z), 
Arf(z) and Ao(z) are continuous, moving z towards the real axis, by a continuity argument, we get that 
Ad(z) + Ao(z) < S{z) in fi^, as long as the condition (2.44) for z is satisfied. This proves Theorem 2.9. q 



2.3.3 Sketch of the proof of the semicircle law for Stieltjes transform 

In this section we strenghten the estimate of Theorem 2.9 for the Stieltjes transform m(z) = J^i ^a- The 
key improvement is that \m — msc\ will be estimated with a precision [Mr])"^ while the {Gu — msd was 
controlled by a precision (Mjy)^^/^ only (modulo logarithmic terms and terms expressing the deterioriation 
of the estimate near the edge). In the following theorem we prove a partial version of (2.34) of Theorem 2.5: 

Theorem 2.16 Assume the conditions of Theorem 2.5, let kq > be fixed and recall the definition of the 
domain D = D^f, from (2.44). Then for any e > and K > there exists a constant C = C{e,K, kq) such 
that 

U{|.»W-.n..WI>|l})<35^. (2.86) 

Proof of Theorem 2.16. We mostly follow the proof of Theorem 2.9. Fix z £ D and we can assume that 
we work in the complement of the small probability events estimated in (2.45) and (2.46). In particular, the 
estimate (2.74) is available. As in (2.79) we have that 



c 

= -- 



1 -CA^ 



i^T, + o((A, + T)2 



holds with a very high probability, but now we keep the first term and estimate it better than the most 
trivial bound used in (2.79) to extract some cancellation from the fluctuating sum. Then with (2.76) and 
(2.85) we have 

j 



holds with a very high probability for any small e > 0. Recall that T.; = Ai + ha — Zi. We have, from (2.49), 
(2.55) and cr^- < M-\ 

where we used (2.74) to bound A^. 
We thus obtain that 



m — nis 
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holds with a very high probabihty. Since hu^s are independent, applying the first estimate in the large 
deviation Lemma 2.12, we have 



(|^i:'...|^fl--v)-^) 



< ^jy- clog log W 



(2.88) 



On the complement event, the estimate (log iV)'-^° (MN) can be included in the last error term in (2.87). 
It only remains to bound 

i 

whose moments are bounded in the next lemma, and we will comment on its proof below. 

Lemma 2.17 [49, Lemma 5.2], [50, Lemma 4-1] Recalling the definition of Zi from (2.50), for any fixed z 
in domain D and any natural number s, we have 



E 



N 



(logiV) 



Mri 



Using this lemma, we have that for any e > and K > 0, 

N 



N 



> 



Mri 



< N- 



-K 



(2.89) 



for sufficiently large N . Combining this with (2.88) and (2.87), we obtain (2.86) and complete the proof of 
Theorem 2.16. q 

Sketch of the proof of Lemma 2. 1 7. We have two different proofs for this lemma, both are quite involved. 
Here we present the ideas of the first proof from [49]. The argument is a long and carefully organized high 
moment calculation, similar to the second moment bound in (2.23), but now we extract an additional factor 
from the sum. Note that boosting the second moment calculation (2.23) to higher moments (and recalling 
that for universal Wigncr matrices M replaces N) we can prove 



\Z^\ < 



1 



/iLfj 



(2.90) 



(we will indicate the proof in Lemma 2.18 below). If Z^'s were independent, then the central limit theorem 
would imply that 



< 



which would be more than enough. They are not quite independent, but almost. The dependence among 
different Zi's can be extracted from using the resolvent formulas in Section 2.1. We will sketch the second 
moment calculation, i.e., the case s = 1. More details and higher moments are given in Sections 8-9 of [49]. 



Since KZi = 0, the variance of [Z] is given by 



N 



Y^z. 



(2.91) 
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We start by estimating the first term of (2.91) for a = I and (3 — 2. The basic idea is to rewrite G^'J'' as 



G 



(1) 



P, 



(12) 



kl 



P, 



(1) 



kl 



(2.92) 



with independent of , a? and P^^ independent of (recall the notational convention: superscript 

indicate independence of the corresponding column of H). To construct this decomposition for fc, Z ^ {1,2}, 
by (2.10) or (2.11) we rewrite G^,^ as 



(1) 



kl 



G 



(12) . ^k2 ^21 



kl 



G 



(1) 



fc,Z^{l,2}. 



(2.93) 



The first term on r.h.s is independent of a^. Applying Theorem 2.9 for the minors, we get that G^;^ < 
(M7?)-i/2 for fc 7^ Z 7^ 1 and G^^l > c> 0, thus 



^k2 ^11 



G. 



(1) 

22 



< 



(2.94) 



holds with a very high probability. Note that this bound is the square of the bound G^^^ < (A^?/) ^^"^ from 
Theorem 2.9. 

Now we define p(i) and P^^^^ (for k,l^l) as 



1. lik^l^ 2, 



P 



(12) 



ri(12) p(l) 



G, 



(1) 



^ki 



G 



(12) 



(2.95) 



2. if /c = 2 or / 



P 



(12) 



kl 



0, 



(2.96) 



Hence (2.92) holds and Pj.}^'' is independent of a^. 

The size of the quadratic forms a^ • P^^'a^ and a^ • P'^-'^^^a^ is estimated in the following lemma whose 
proof is postponed. 



Lemma 2.18 For N ^ < rj < 1 and fixed p £ N, we have 

Cn 



E 



ai.p(i)ai 



< 



(A/7?)P ' 



E 



ai.p(i2)ai 



< 



G,) 



{Mt])p/^ 



(2.97) 



Note that the first quadratic form is smaller, but the second one is independent of column (2) of H. 



Define an operator lE^ := I — E^i, where I is identity operator. With this convention, we have the 
following expansion of Zi 

Zi = lEia^ • P^i2)ai ^ jE^jji . p{i)ai. (2.98) 



Exchange the index 1 and 2, we can define p(2i) and P*^^^ and expand Z2 as 

Z2 = lEaa^ • p(2i)a2 + lEaa^ • p(2)a2 



(2.99) 
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(2.100) 



Here Pj^i is independent of and a-'^; P^^ is independent of a^. Combining (2.99) with (2.98), we have 
for the a = 1, /3 = 2 cross term in (2.91) that 

EZiZ2=e[(iEi |ai.p(i2)ai+ai-P(i)ai}) (1E2 {a^ • ^(21)^^2 ^ ^2 . p(2)^2|y 
Note that if X'*' is a random variable, independent of a*, then for any random variable Y , we have 

IE, = x(*'iE,y 

in particular lE.X^') = (with Y = 1). Thus 



E 



(lEiX(2))(lE2X(^' 



E 



lEi 



(IE2X(1))X(2) 







IEiaip(i)ai ) (lE2a2p(2)a2 



since EIE^ = 0. Using this idea, one can easily see that the only non- vanishing term on the right hand side 
of (2.100) is 

' ' ~ (2.101) 



(2.102) 



By the Cauchy-Schwarz inequality and Lemma 2.18 we obtain 

1 



\KZiZ2\ < 



(Af?7)2 



Using (2.92), Lemma 2.18 also implies that 



E\zr< 



l<i< N 



(2.103) 



(Mr/)P/2 ' 

i.e. it also proves (2.90) and it estimates the second term in (2.91) by N^^{Mt])~^ < {Mr])~'^. Since the 
indices 1 and 2 in (2.102) can be replaced by a ^ /3, together with (2.91) we have thus proved Lemma 2.17 
for s = 1. Q 



Proof of Lemma 2.18. First we rewrite a^ • P'^'a^ as follows 



• P(i)ai = ^ ai 



^2,2 / k^2 1^2 



By the large deviation estimate (2.70) and (2.94), we have 



k,l=^2 



■< I ^k2 ^21 



a 



(1) 



> 



Mt] 



(2.104) 



(2.105) 



Similarly, from (2.68) and the fact that |ja^||oo ^ Af^^/2 -^^g ggi; ^i-^at the second and third terms in (2.104) 
are bounded by -^-^^ < (Mrj)^^ . The last term is even smaller, this is of order 1/M, with a very high 
probability. We have thus proved that 



ai • P(i)ai 



< 



> 1 - Ar-='°e 



— c loff log A'" 



(2.106) 



This inequality implies the first desired inequality in (2.97) except on the exceptional set with probability 
less than any power of 1/N. Since all Green functions are bounded by < TV, the contribution from the 
exceptional set is negligible and this proves the first estimate in (2.97). The second bound is proved similarly. 

□ 
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2.4 Strong local semicircle law 

In this section we present our latest results from [50] which remove the k dependence from Theorem 2.5, 
Theorem 2.6 and Theorem 2.7 for ensembles spread AI = N, in particular for generalized Wigner matrices. 
Recall the notations: 

A(j := max I Gfcfc - m^c I, Ao := max |Gij|, A:=|m-msc| 

k i^j 

and recall that all these quantities depend on the spectral parameter z and on N. 

Theorem 2.19 (Strong local semicircle law) [50, Theorem 2.1] Let H — [hij) he a hermitian or sym- 
metric N X N random matrix, N > i, with E/i^ = 0, 1 < i,j < N, and assume that the variances afj 
satisfy (1.16), (2.29) with some positive constants d± > and the upper bound 



4 < % (2-107) 



Suppose that the distributions of the matrix elements have a uniformly subexponential decay in the sense that 
there exist constants C,'d > 0, independent of N , such that for any x > 1 and 1 < i, j < N we have 

V{\h,j\ > xuij) < Cexp { -x"^). (2.108) 

Then for any constant ^ > 1 there exist positive constants L, C and c, depending only on ^, on d± 
from (2.108), (2.29) and on Co from (2.107), such that the Stieltjes transform of the empirical eigenvalue 
distribution of H satisfies 

U {A(z)>^i^i^}) <C7exp[-c(log7V)«] (2.109) 



with 



S:=Sl = \^z = E + iT] : \E\<5, N-\\og N^^^ < t] < (2.110) 
The individual matrix elements of the Green function satisfy that 



f(U i^M^) + U^)>{^osNr^^^^^^^ (2.111) 

Furthermore, in the following set outside of the limiting spectrum, 

O L ■■= = E + iT] : NT]y/l^> {log Nf^, K>ri, \E\>2Y with k:=||£;|-2|, (2.112) 
we have the stronger estimate 

U {A(;.)>^i^^^}) <Cexp[-c(log7V)«]. (2.113) 
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The subexponential decay condition (2.108) can be weakened if we are not aiming at error estimates 
faster than any power law of N. This can be easily carried out and we will not pursue it in this paper. 

Prior to our results in [48] and [49] , a central limit theorem for the semicircle law on macroscopic scale for 
band matrices was established by Guionnet [59] and Anderson and Zeitouni [5] ; a semicircle law for Gaussian 
band matrices was proved by Disertori, Pinson and Spencer [26]. For a review on band matrices, see the 
recent article [93] by Spencer. 

As before, the local semicircle estimates imply that the empirical counting function of the eigenvalues is 
close to the semicircle counting function and that the locations of the eigenvalues are close to their classical 
location. We have the following improved results (cf. Theorems 2.6-2.7): 

Theorem 2.20 [50, Theorem 2.2] Assume the conditions of Theorem 2.19, i.e. (1.16), (2.29) with some 
positive constants 5± > 0, (2.107) and (2.108). Then for any constant ^ > 1 there exist constants Li, C and 
c > 0, depending only on 'd, 5± and Cq such that 

P<3j : \Xj -jj\> {\ogN)^'\mm{j,N - j + 1)] ^^V^^/'M < Cexp [ - c(log A^)«] (2.114) 



s^ip \n{E) - n,,{E)\ > il^i^)^ 1, < C cxp [ - c(log Ar)«] . (2.115) 



|B|<5 



For Wigner matrices, (2.115) with the factor A^^^ replaced by A~^/^ (in a weaker sense with some 
modifications in the statement) was established in [8] and a stronger A~^/^ control was proven for the 
difference En{E) — UsdE). In Theorem 1.3 of a recent preprint [100], the following estimate (in our scaling) 



7j 



'1/3. 



{j,N-j + l) N-^l^-'\ (2.116) 



with some small positive Sq, was proved for standard Wigner matrices under the assumption that the third 
moment of the matrix element vanishes. In the same paper, it was conjectured that the factor N~^/^~'^° on 
the right hand side of (2.116) should be replaced by A~^/'^+^. Prior to the work [100], the estimate (2.114) 
away from the edges with a slightly weaker probability estimate and with the {\ogN)^^ factor replaced by 
A* for arbitrary 5 > Q was proved in [49] (see the equation before (7.8) in [49]). 

We remark that all results are stated for both the hermitian or symmetric case, but the statements and 
the proofs hold for quaternion self-dual random matrices as well (see, e.g., Section 3.1 of [46]). 

There are several improvements of the argument presented in Sections 2.3.2 and 2.3.3 that have led to 
the proof of the optimal Theorem 2.19 for the M ~ N case. Here we mention the most important ones. 

First, notice that the end of the estimate (2.72) can be done more effectively for M = A^ and af^. < C/N, 
using that 

— ^"^Gkl ^J^Yl ^^Gkk + CAl^3mm + CAl<A + Al + 3mm,e 

k k 



The gain here is that 3mmsc(z) ~ y/n + ?? which is a better estimate near the edges than just 0(1). We 
therefore introduce 



* = *(z) := 



' A(z) + Jnmisciz) 
Nri 



69 



as our main control parameter, and note that this is random, but it depends only on A. Similarly to (2.75) 
and (2.76), one can then prove that 

Ao + maxTi<* (2.117) 

i 

with very high probability. 

Second, a more careful algebra of the self-consistent equation (2.78) yields the following identity: 



(i-toL)M -"^LH' + "^L[^] + o(^) +o((iogA^)vi/2) 



(2.118) 



where [Z] := N ^ SiLi ^i- The advantage of this formula is that it allows not only to express [v] from the 
left hand side (after dividing through with 1 — m^^), but in case of 



(i-"4)H 



(which typically happens exactly near the edge, where 1 — m^^ is small), it is possible to express [v\ from the 
right hand side. This makes a dichotomy estimate possible by noticing that if 



(1 - ml^)[v] = m^Jw]^ + Small, 



or, in other words, 



a(z)A = +^(z), a{z) 



1 — mi 



\/k + rj, j3 = Small 



holds, then for some sufficiently large constant U and another constant Ci(C/), we have 

a{z) 



A(z) < UP{z) or A(z) > 
A(z) <Ci(C/)/3(z) 



U 



if a > U'^P 
if a < U'^P 



(2.119) 
(2.120) 



The bad case, A > a/J7, is excluded by a continuity argument: we can easily show that for 77 = 10 this does 
not happen, and then we reduce 77 and sec that we arc always in the first case, as long as a > J7^/3, or, if 
a < U'^/3, then we automatically obtain A < (3. The actual proof is more complicated, since the "Small" 
term itself depends on A, but this dependence can be absorbed into the other terms. Moreover all these 
estimates hold only with a very high probability, so exceptional events have to be tracked. 

Finally, the estimate in Lemma 2.17 has to incorporate the improved control on the edge. Inspecting the 
proof, one sees that the gain (Mrj)^'^'^ comes from the offdiagonal resolvent elements, in fact Lemma 2.17 is 
better written in the following form 



E 



N 



i=l 



< Cs 



As before, this can be turned into a probability estimate by taking a large power, s ^ (logiV)?. Using (2.117) 
to estimate Aq by 4* and the fact that < o(A) + (since Nrj ^1), one can show that the m1^[Z] term 
on the r.h.s of (2.118) is also "Small". The details are given in Sections 3 and 4 of [50]. 
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2.5 Delocalization of eigenvectors 

Let If be a universal Wigner matrix with a subexponential decay (2.32). Let v be an £^-normalized eigen- 
vector of H , then the size of the ^P-norm of v, for p > 2, gives information about delocalization of v. We 
say that complete delocalization occurs when ||v||p < Ar-i/2+i/p (note that ||v||p > CA^-V2+i/p||v||2). The 
foUowing result shows that for generalized Wigner matrices (1.17), the eigenvectors are fully delocalized with 
a very high probability. For universal Wigner matrices with spread M (2.31), the eigenvectors are delocalized 
on scale at least M . 

Theorem 2.21 Under the conditions of Theorem 2.5, for any E with k ~\E — 2\> kq, we have 

(2.121) 




We remark that ||v||oo ^ M^^^^ indicates that the eigenvector has to be supported in a region of size at 
least M, i.e. the localization length of v is at least M. We note that the delocalization conjecture predicts, 
that the localization length is in fact M^, i.e. the optimal bound should be 

l|v||oo<^ 

with a high probability. As it was explained in Section 1.3, this is an open question. Only some partial results 
are available, i.e. we proved in [36, 37] that for random band matrices (1.18) with band width W ~ M, the 
localization length is at least Af^+s. 

Proof. We again neglect the dependence of the estimate on kq (this can be traced back from the proof). 
The estimate (2.45) guarantees that 

\Gu{z)\ < C 

for any z = E + irj with Mij > (logiV)*-^ with a very high probability. Choose rj = (logiV)'-^/A/ . Let be 
the eigenvectors of H and let v be an eigenvector with eigenvalue A, where |A — i?| < 1/M. Thus 



(log7V)C 



i.e. 



I|v|l.< 

□ 

Note that the proof was very easy since pointwise bounds on the diagonal elements of the resolvent were 
available. It is possible to prove this theorem relying only on the local semicircle law, which is a conceptually 
simpler input, in fact this was our tradition path in [39, 40, 41]. For example, in [41] we proved 

Theorem 2.22 [41, Corollary 3.2] Let H he a Wigner matrix with single entry distribution with a Gaussian 
decay. Then for any \E\ < 2, fixed K and 2 < p < oo we have 

pjav : Hv = Av, [A - -B] < ^, l|v||2 = 1, ||vj|p > QN-i+i | < Ce~^^ 
for Q and N large enough. 
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Sketch of the proof. We will give the proof of a weaker result, where logarithmic factors are allowed. 
Suppose that ffv = Av and A G [—2 + kq,2 — kq], with kq > 0. Consider the decomposition 

H-C: "ml (2-122) 



introduced in Section 2.1, i.e, here a = (/ii,2, • ■ • /ii.jv)* and _ff is the (A^ ^ 1) x {N — 1) matrix obtained 
by removing the first row and first column from H. Let fia and Uq, (for a = 1, 2, . . . , — 1) denote the 
eigenvalues and the normalized eigenvectors of H^^K From the eigenvalue equation Hv = Av we find 

{h- X)vi + a-v' = 

via+ - X)v' = 

where we decomposed the eigenvector v = {vi,v'), vi € R, v' S R^~^. Solving the second equation for v' 
we get v' = wi(A — H'-^'>)~'-^a. From the normalization condition, ||vjp = vf + \\v'\\^ = 1 wc thus obtain for 
the first component of v that 

where in the second equality we set ^q, = \\/Na- u„p and used the spectral representation of H'-^\ We also 
chose an interval / of length 77 = |/| = Q/N. Is it easy to check that E^^ = 1 and that different ^^'s are 
essentially independent and they satisfy the following large deviation estimate: 

ctSl 

where m = \I\ is the cardinality of the index set. There are several proofs of this fact, depending on the 
condition on the single site distribution. For example, under the Gaussian decay condition, it was proved in 
Lemma 4.7 of [41] that relies on the Hanson- Wright theorem [62]. 

Let now A/} denote the number of eigenvalues of H in /. From the local semicircle (e.g. Theorem 1.9) 
we know that TV} is of order N\I\ for any interval / away from the edge and |/| ^ 1/N. We recall that the 
eigenvalues of i/, Ai < A2 < . . . < A^r, and the eigenvalues of i?'^^ are interlaced. This means that there 
exist at least A/} — 1 eigenvalues of H^^") in /. Therefore, using that the components of any eigenvector are 
identically distributed, we have 

Q 



'(3 V with = Av, ||v|| = 1, A e / and ||v||oo > 



< iV^P(3 V with Hv = Av, ||v|| = 1, A £ / and > 



A'^e-f ^ / (2.124) 

E ^" ^ —qT- and AA/ > cA^|/| j +CN'^¥{Mi< cN\I\) 
< C N^e~^^\ + C N^e-'- 
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assuming that m'^rf'/Q'^ < cN\I\ = cN\I\, i.e. that Q > y/Tffj. 

Here we used that the deviation from the semicircle law is subexponcntially penalized, 

¥{Afi < cN\I\) < e-'^'VWl 

for sufficiently small c and c' if / is away from the edge. Such a strong subexponcntial bound does not 
directly follow from the local semicircle law Theorem 2.5 whose proof was outlined in Section 2.3, but it can 
be proved for Wigncr matrices with Gaussian decay. Theorem 1.9. r—i 



3 Universality for Gaussian convolutions 

3.1 Strong local ergodicity of the Dyson Brownian Motion 

In this section, we consider the following general question. Suppose fi = e^^^/Z is a probability measure on 
the configuration space M.^ characterized by some Hamihonian H:M.^ ^R, where Z = J e-^^'^Mx < oo 
is the normalization. We will always assume that Ji is symmetric under permutation of the variables 
X = {xi,X2, ■ ■ ■ ,xn) S M.^ . The typical example to keep in mind is the Hamiltonian of the general f3- 
ensembles (1.45), or the specific GOE (/3 = 1) or GUE {(3 = 2) cases. 

We consider time dependent permutation-symmetric probability density /t(x), t >Q with respect to the 
measure /i(dx) = /i(x)dx, i.e. J /t(x)/i(dx) = 1. The dynamics is characterized by the forward equation 

dtft = Lft, t > 0, (3.1) 
with a given permutation-symmetric initial data /q. The generator L is defined via the Dirichlet form as 

D{f) := D,{f) = - / /i/d/i = E ^ / ^1 = ^-r (3-2) 

Formally, we have L = - i(VH)V. We will ignore the domain questions, we just mention that D{f) 
is a semibounded quadratic form, so L can be defined via the Friedrichs extension on L^(d/i) and it can be 
extended to as well. The dynamics is well defined for any /o £ L^(d/i) initial data. For more details, see 
Appendix A of [46]. 

Strictly speaking, we will consider a sequence of Hamiltonians "Hat and corresponding dynamics Liq and 
It,N parametrized by iV, but the A^-dependence will be omitted. All results will concern the A^ — > oo limit. 

Alternatively to (3.1), one could describe the dynamics by a coupled system of stochastic differential 
equations (1.64) as mentioned in Section 1.6.2, but we will not use this formalism here. 

For any k> \ we define the fc-point correlation functions (marginals) of the probability measure /td^ by 
pf'l{xi,X2,...,Xk) ^ ftix)fi{x)dxk+i ■ ■ .dxN- (3.3) 

The correlation functions of the equilibrium measure are denoted by 

P^uNi^'i-^^2,---,Xk) = / /^(x)da;fc+i . . .dccAT. 
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We now list our main assumptions on the initial distribution /o and on its evolution /(. This formalism 
is adjusted to generalized Wigner matrices; random covariance matrices require some minor modifications 
(see [46] for details). We first define the subdomain 

Sjv := {xe K^, xi < X2 < ... < xat} (3.4) 

of ordered sets of points x. 

Assumption I. The Hamiltonian H of the equilibrium measure has the form 

N 



where /3 > 1. The function C/ : M ^ M is smooth with C/" > 0, and 

U{x) > C\x\^ for some 6 > and |a;| large. 

Note that this assumption is automatic for symmetric and hermitian Wigner matrices with the GOE or 
GUE being the invariant measure, (1.44)-(1.45). 

Near the x^+i = Xi boundary component of S^r, the generator has the form 

1 . P.. 



L = H — duj + regular operator 

in the relative coordinates u = ^(x^+i — Xi) when u ^ 1. It is known (Appendix A of [46]) that for /? > 1 
the repelling force of the singular diffusion is sufficiently strong to prevent the particle from falling into the 
origin, i.e., in the original coordinates the trajectories of the points do not cross. In particular, the ordering 
of the points will be preserved under the dynamics (for a stochastic proof, see Lemma 4.3.3 of [4]). In the 
sequel we will thus assume that ft is a probability measure on Eat. We continue to use the notation / and 
^ for the restricted measure. Note that the correlation functions p'^^^ from (3.3) are still defined on K'^, i.e., 
their arguments remain unordered. 

It follows from Assumption I that the Hessian matrix of T-L satisfies the following bound: 

(v,V^H(x)v) >^^ ^'''~"^|' , v=(t,i,...,«^)eR^, xeE^. (3.6) 

jV [Xi Xj) 

This convexity bound is the key assumption; our method works for a broad class of general Hamiltonians 
as long as (3.6) holds. In particular, an arbitrary many-body potential function V{'x) can be added to the 
Hamiltonians (3.5) as long as V is convex on E^r. 

Assumption II. There exists a continuous, compactly supported density function g{x) > 0, /jjP = 1, 
on the real line, independent of TV, such that for any fixed a, 6 € R 



lim sup 



J ^^""^ ^ [«,&])/tWdM(x) - J g{x)dx 



0. (3.7) 



74 



In other words, we assume that a hmitiiig density exists; for Wigner matrices this is the semicircle law. 
Let 7j = 7j,7v denote the location of the j-th point imder the limiting density, i.e., 7^ is defined by 

g{x)dx = j, 1 < j < 7j e snppg. (3.8) 

-00 

We will call jj the classical location of the j-th point. Note that jj may not be uniquely defined if the 
support of g is not connected but in this case the next Assumption III will not be satisfied anyway. 

Assumption III. There exists an a > such that 

1 ^ 

™P / M E(^J- - 7,)'./t(dx)M(dx) < (3.9) 



t>N 



with a constant C uniformly in N. 



Under Assumption II, the typical spacing between neighboring points is of order 1/iV away from the 
spectral edges, i.e., in the vicinity of any energy E with g{E) > 0. Assumption III guarantees that typically 
the random points xj remain in the N^^^^^° vicinity of their classical location. 



The final assumption is an upper bound on the local density. For any / e R, let Afj :— -'-(^i ^ ^) 

denote the number of points in /. 

Assumption IV. For any compact subinterval Iq C {E : g{E) > 0}, and for any 6 > Q, a > there 
are constants C„, n G N, depending on Iq, 6 and a such that for any interval / C /o with |/| > N~^^°' and 
for any K > 1, we have 

sup I l{JVi > KN\I\}frdfi <CnK-", 71 = 1,2,..., (3.10) 

where a is the exponent from Assumption III. 

Note that for the symmetric or hermitian Wigner matrices. Assumption I is automatic. Assumption II is 
the (global) semicirle law (1.27) and Assumption IV is the upper bound on the density (Lemma 1.8). The 
really serious condition to check is (3.9). 

The following main general theorem asserts that the local statistics of the points Xj in the bulk with 
respect to the time evolved distribution ft coincide with the local statistics with respect to the equilibrium 
measure fj, as long as i ^ A^~^°. 

Theorem 3.1 [4-6, Theorem 2.1] Suppose that the Hamiltonian given in (3.5) satisfies Assumption I and 
Assumptions II, III, and IV hold for the solution ft of the forward equation (3.1) with exponent a. Assume 
that at time to = N^'^^ the density ft„ satisfies a hounded entropy condition, i.e., 

SMto) / fto log/tod/i < CiV" (3.11) 
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with some fixed exponent m. Let E gM. and b > such that inin{^?(a;) : x £ [E — b, E + b]} > 0. Then for 
any 5 > 0, for any integer k > 1 and for any compactly supported continuous test function O : M'^ — > M, we 
have, with the notation t := iV^^°+*, 



lim sup / -— / dai . . . da^ 0{ai, ...,ak) 

N^oo t>T JE-b ^0 JR'' 



X 



V^*^^ P^^^J + Ng{E) ' ■ • ■ ' ^ + Ng{E) J 



(3.12) 



We remark that the Hmit can be effectively controlled, in fact, in [46] we obtain that before the N ^ oo 
limit the left hand side of (3.12) is bounded by CiV^^' [&-iAr-i(i+2») + b-^/^N-^/"^] . 

In many applications, the local equilibrium statistics can be explicitly computed and in the 5 — > limit 
they become independent of E, in particular this is the case for the classical matrix ensembles. The simplest 
explicit formula is for the GUE case, when the correlation functions are given by the sine kernel (1.35). 



3.2 The local relaxation flow 

The main idea behind the proof of Theorem 3.1 is to analyze the relaxation to equilibrium of the dynamics 
(3.1). The equilibrium is given by an essentially convex Hamiltonian Ji so the Bakry-Emery method [10] 
applies. This method was first used in the context of the Dyson Brownian motion in Section 5.1 of [43]; the 
presentation here follows [46]. 

To explain the idea, assume, temporarily, that the potential U in (3.5) is uniformly convex, i.e. 

U"ix) >Uo>0. 

This is certainly the case for the Gaussian ensembles when U{x) = jx'^. Then we have the following lower 
bound on the Hessian of H 

Hess-H>/3C/o (3.13) 

on the set Sat (3.4) since the logarithmic potential is convex. It is essential to stress at this point that 
(3.13) holds only in the open set S^r, since the second derivatives of the logarithmic interactions have a delta 
function singularity (with the "wrong" sign) on the boundary. It requires a separate technical argument to 
show that for /3 > 1 the points sufficiently repel each other so that the Dyson Brownian motion never leaves 
the open set T,n and thus the Bakry-Emery method applies. See a remark after Theorem 3.2. 

We devote the next pedagogical section to recall the Bakry-Emery criterion in a general setup on M^. 

3.2.1 Bakry-Emery method 

Let the probability measure fi on M.^ be given by a strictly convex Hamiltonian T-L: 

-«(x) 

dAf(x) = ^ dx, V2-H(x) = Hess?^(x) > A' > (3.14) 

Zj 

with some constant and let L be the generator of the dynamics associated with the Dirichlet form 
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(note that in this presentation we neglect the prefactor 1/N originally present in (3.2)). Formally we have 
L = iA — i(V'H)V. The operator L is symmetric with respect to the measure d/i, i.e. 

j fLgd^i = J (L/).gdM = -^Jvf- V.gd^. (3.15) 

We define the relative entropy of any probability density / with J fdfi = 1 by 



Both the Dirichlet form and the entropy are non-negative, they are zero only for / = 1 and they measure the 
distance of / from equilibrium / = 1. The entropy can be used to control the total variation norm directly 
via the entropy inequality 

I |/-l|dM< v/25^(/). (3.16) 

Let ft be the solution to the evolution equation 

dtft = Lft, t > 0, (3.17) 

with a given initial condition /o and consider the evolution of the entropy S{ft) and the Dirichlet form 
D{'\/Yt)- Simple calculation shows 

1 fi^ft? 



dtSift) = I {Lft)\ogftd^l + j ft j^dfi = --J ^^dfi = -4i?(v//t), (3.18) 

where wc used that J Lftdfi = by (3.15). Similarly, we can compute the evolution of the Dirichlet form. 
Let h :— \f] for simplicity, then 

In the last step we used that Llr? = {\7h)^ + 2hLh that can be seen either directly from L = |A — i(VH)V 
or from the following identity for any test function g: 



J gLh^dii = J V g -V {h^)dn = - j /i(Vg)(V/i)d^ = J [-V{hg)+gVh]Vhd^ ^ J g {VKf + 2hLh 
We compute (dropping the t subscript for brevity) 
dtD{^t) =\dt I (V/i)2d/x 

= J {Vh){VLh)d^i + i y (V/i) • V-^^^d^ 



d^. 



h 



dfi 



ij ij 

\ J iyh)(y-n)vhd, -\jY. (a,/. - iM^Mj^d^, (3.19) 
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where we used the commutator 

[V,L] = -i(V2H)V. 
Therefore, under the convexity condition (3.14), we have 

dtD{^t) < -KD{^t)- (3.20) 

Combining (3.18) and (3.20), 

dtD{^t) < ^dtSift). (3.21) 

At t = oo the equihbrium is achieved, /oo = 1, and both the entropy and the Dirichlet form are zero. After 
integrating (3.21) back from t = oo, we get the logarithmic Sobolev inequality 

Sift) < ^D{^t) (3.22) 
for any t > 0, in particular for any initial distribution f ~ fa- Inserting this back to (3.18), we have 

dtSift) < -KSift). 

Integrating from time zero, we obtain the exponential relaxation of the entropy on time scale t ~ 1/K 

Sift) < e-*^'^(/o). (3.23) 
FinaUy, we can integrate (3.18) from time t/2 to t to get 



Sift) - Sift/2) = -4 r Di^)dT 

Jt/2 



t/2 

Using the positivity of the entropy Sift) > on the left side and the monotonicity of the Dirichlet form 
(from (3.20)) on the right side, we get 

Di^t) < jSift/2), (3.24) 
thus, using (3.23), we obtain exponential relaxation of the Dirichlet form on time scale t ^ \/K 

Di^t) < y^'/'sifo). 

We summarize the result of this calculation: 

Theorem 3.2 [10] Assuming the convexity bound on the Hamiltonian, 'S/^TL > K with some positive con- 
stant K, the measure /i = e^^/Z satisfies the logarithmic Sobolev inequality 

Sif) < ^Di^), for any density f with J /d/i = 1, (3.25) 

and the dynamics (3.17) relaxes to equilibrium on the time scale t ^ ^/K both in the sense of entropy and 
Dirichlet form: 

Sift) < e-*^5(/o), Di^t) < -^e-"'/^Sif^). (3.26) 

□ 
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Technical remark. In our application, the dynamics will be restricted to the subset Sat = {x : xi < 
X2 < ■ . ■ < xn}, and thus we need to check that the boundary term in the integration by parts 

/ d,hdf^he-'^dx = (3.27) 

(from the third line to the fourth line in (3.19)) vanishes. The role of H will be played by NHn where Hn 
is defined in (3.5). Although the density function of the measure e~^^" behaves as (x^+i — Xi)^ near the 
Xi-^.l = Xi component of the boundary, hence it vanishes at the boundary, the function ft is the solution to a 
parabolic equation with a singular drift, so in principle it may blow up at the boundary. Further complication 
is that h = \f] and the derivative of the square root is singular. Nevertheless, by using parabolic regularity 
theory and cutoff functions properly subordinated to the geometry of the set Sjv, we can prove that (3.27) 
vanishes. This is one reason why the restriction /? > 1 is necessary. For the details, see Appendix B of [46]. 



3.2.2 Universality of gap distribution of the Dyson Brownian Motion for Short Time 

Using the convexity bound (3.13) for the Dyson Brownian motion, the Bakry-Emery method guarantees that 
satisfies the logarithmic Sobolev inequality and the relaxation time to equilibrium is of order one. 
The following result is the main theorem of Section 3.2. It shows that the relaxation time is in fact much 
shorter than order one at least locally and for observables that depend only on the eigenvalue differences. 

Theorem 3.3 (Universality of the Dyson Brownian Motion for Short Time) H6, Theorem 4-1] 
Suppose that the Hamiltonian % given in (3.5) satisfies the convexity bound (3.6) with /3 > 1. Let ft be the 
solution of the forward equation (3.1) so that after time to = N~'^° it satisfies Sfi{fto) '■= J /to(^og/to)d/i < 
CN"^ for some m fixed. Set 

Q ~ sup ^ / {xj ~ 7, )VtdM, (3.28) 

t>to . J 

and assume that Q < CN"^ with some exponent m. Fix n > I and an array of increasing positive integers, 
m = (mi, TO2j ■ • • , iTT-n) G 1^+- Let G : M" — > M 6e a bounded smooth function with compact support and set 

^i,m(x) G[N{xi - Xi+rni_),J^{Xi+nii " Xi+ra2), ■ ■ ■,N{Xi+rn„.i " Xi+m„)^ ■ (3.29) 

Then for any sufficiently small e' > 0, there exist constants C,c > 0, depending only on e' and G such that 
for any J C {1, 2, . . . , TV — m„} and any t > 3to = 3N~^°, we have 

j ^Y.^^,Mfrd^i- I l^g,,„(x)dM| < CN''^\J\Q{NT)-^+Ce-^'''' . (3.30) 
In Section 3.2.3 we explain the intuition behind the proof. The precise proof will be given in Section 3.2.4. 



3.2.3 Intuitive proof of Theorem 3.3 

The key idea is that we can "speed up" the convergence to local equilibrium by modifying the dynamics by 
adding an auxiliary potential W^(x) to the Hamiltonian. It will have the form 

" 1 
iy(x) ^iy,(x,), Wj{x) ^{x, ~ 7,)^ (3.31) 
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i.e. it is a quadratic confinement on scale R for each eigenvalue near its classical location, and we define 



n:='H + W. (3.32) 

The new measure is denoted by 

duj := w(x)dx, uj e^^^/Z 

and it will be called the pseudo equilibrium measure (with a slight abuse of notations we will denote by 
w both the measure and its density function with respect to the Lebesguc measure). The corresponding 
generator is denoted by L. We will typically choose i? <C 1, so that the additional term W substantially 
increases the lower bound (3.13) on the Hessian, hence speeding up the dynamics from relaxation time on 
scale 0(1) to 0{B?). This is the first step of the proof and it will be formulated in Theorem 3.4 whose 
proof basically follows the Bakry-Emery argument from Section 3.2.1. 

In the second step, we consider an arbitrary probability measure of the form gw, with some function g, 
and we control the difference of expectation values 



Ggdw - / Gdcj (3.33) 



of the obscrvablcs 

G = 

N 



ie.i 



in terms of the entropy and the Dirichlet form of q with respect to w. Eventually this will enable us to 
compare the expectations 



j Gqduj- J G q'diJ 



for any two measures qco and q'o;, in particular for the measures fr^J■ and fi that will be written in this form 
by defining q = fTl^/i^ and q' = 

Here we face with an A^-problem: both the entropy and the Dirichlet form are extensive quantities. A 
direct application of the entropy inequality (3.16) to (3.33) would estimate the observable G, an order one 
quantity, by a quantity of order 0{\/N). Instead, we can run the new dynamics up to some time r and write 

J Gqduj- J Gdcj J G{q - qr)duj + J G(g^ - l)da;. 

If T is larger than the relaxation time of the new dynamics, then the second term is exponentially small by 
the entropy inequality, and this exponential smallness suppresses the iV-problem. 

To estimate the first term, we want to capitalize on the fact that t is small. By integrating back the time 
derivative, qr ^ Q = Jq dtqt dt, we could extract a factor proportional with r, but after using dqt = Lqt and 
integrating by parts we will have to differentiate the observable that brings in an additional N factor due to 
its scaling. It seems that this method estimates a quantity of order one by a quantity of order N. However, 

1/2 

we have proved new estimate, see (3.45) later, that controls J G{q — qr)doj by {tD^{^)/N') ; notice the 
additional factor. The key reason for this improvement is that the dynamics relaxes much faster in certain 
directions, namely for obscrvablcs depending only on differences of Xi 's. To extract this mechanism, we use 
that the lower bound (3.6) on the Hessian is of order in the difference variables Vi — Vj and this estimate 
can be used to gain an additional TV-factor; this is the content of Theorem 3.5. The estimate will have a free 
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parameter r that can be optimized. This parameter stems from the method of the proof: we prove a time 
independent inequahty by a dynamical method i.e., we run the flow up to some time r and we estimate the 
q — Qt and (/t- — Qoo differently. 

Finally, in the third step, we have to compare the original dynamics with the new one in the sense of 
entropy and Dirichlet form since D^{^J f-r^i/u}) and SujifrfJ-/^) need to be computed for the estimates in 
the second step. These would be given by the standard decay to equilibrium estimates (3.26) if /tA*/w were 
evolving with respect to the modified dynamics, but fr evolves by the original dynamics. We thus need to 
show that the error due to the modification of the dynamics by adding W is negligible. 

It turns out that the key quantity that determines how much error was made by the additional potential 
is the norm of W, i.e. 

At := JiVW)^ftdfi. 

Due to the explicit form ofW, we have 

At = /(^* " l^fftdy- < CN-^°R~^ (3.34) 

i 

using Assumption III (3.9). Given a > 0, we can therefore choose an i? <C 1 so that we still have A ^ 1. 
This will complete the proof. 

Note that the speed of convergence is determined by the second derivative of the auxiliary potential, 
while the modification in the Dirichlet form and the entropy is determined by the (VW)"^. So one can speed 
up the dynamics and still compare Dirichlet forms and entropies of the two equilibrium measures if a strong 
apriori bound (3.34) on A is given. This is one of the reasons why the method works. 

The other key observation is the effective use of the convexity bound (3.6) which exploits a crucial 
property of the dynamics of the Dyson Brownian motion (1.64). The logarithmic interaction potential gives 
rise to a singular force 

''(-) = -r--2]vi:^rb- <"^) 

acting on the i-th particle. Formally F{xi) is a mean field force, and if Xj were distributed according to the 
semicircle law, then the bulk of the sum would cancel the —jXi term. However, the effect of the neighboring 
particles, j = i ± 1, is huge: they exert a force of order one on Xi. Such a force may move the particle 
Xi by a distance of order 1/iV within a very short time of order 1/iV. Note that by the order preserving 
property of the dynamics, an interval of size 0{1/N) is roughly the whole space available for Xi, at least in 
the bulk. Thus Xi is likely to relax to its equilibrium within a time scale of order 1 /N due to the strong 
repulsive force from its neighbors. Of course this argument is not a proof since the other particles move 
as well. However, our observables involve only eigenvalue differences and in the difference coordinates the 
strong drift is persistently present. This indicates that the eigenvalue differences may relax to their local 
equilibrium on a time scale almost of order 1/N. 

For other observables, the relaxation time is necessary longer. In particular, it can happen that there 
is no precise cancellation from the bulk in (3.35), in which case the neighboring particles all feel the same 
mean field drift and will move collectively. In fact, if the initial density profile substantially differs from the 
semicircle, the relaxation to the semicircle may even take order one time (although such scenario is excluded 
in our case by (3.9)). 
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3.2.4 Detailed proof of Theorem 3.3 

Every constant in this proof depends on e' and G, and we will not follow the precise dependence. Given 
T > 0, we define R := t^'^ N'"' '"^ . 

We now introduce the pseudo equilibrium measure, cjjv = = V'M; defined by 

iP := £cxp(-iVW^), 

where Z is chosen such that w is a probability measure, in particular lj ~ e~^^/Z with 

'H = 'H + W. 

The potential W was defined in (3.31) and it confines the j-th point Xj near its classical location 7^. 

The local relaxation flow is defined to be the reversible dynamics w.r.t. w. The dynamics is described by 
the generator L defined by 

j fZgdio = - A. ^ 1 (a, /)(5,g)dc.. (3.36) 



Explicitly, L is given by 

L = L-J2b,d,, b,=w;{x,)^^l^. (3.37) 



3 

Since the additional potential Wj is uniformly convex with 

inf inf W''ix) > R-^, (3.38) 



by (3.6) and /? > 1 we have 



V, V^^(x)v) > ^ llvlP + 1 5: ij;^-^, V e (3.39) 



i?2 " " iV ^ (X^ - X.f 

The R^^ in the first term comes from the additional convexity of the local interaction and it enhances the 
local Dirichlet form dissipation. In particular we have the uniform lower bound 

V^H = Hess(-logw) > i?r2. 

This guarantees that the relaxation time to equilibrium for the L dynamics is bounded above by Ci?^. 

The first ingredient to prove Theorem 3.3 is the analysis of the local relaxation flow which satisfies the 
logarithmic Sobolev inequality and the following dissipation estimate. 

Theorem 3.4 Suppose (3.39) holds. Consider the forward equation 

dtqt=Lqt, t>0, (3.40) 

with initial condition qa ~ q and with reversible measure lo. Assume that q G L°°{Au!). Then we have the 
following estimates 
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-I fOO p ^ 



and the logarithmic Soholev inequality 



with a universal constant C. Thus the time to equilibrium is of order B? : 

S^{qt)<e-^"'''s^{q). 



(3.42) 
(3.43) 
(3.44) 



Proof. This theorem can be proved foUowing the standard argument presented m Section 3.2.1. The key 
additional input is the convexity bound (3.6) which gives rise to the second term on the r.h.s of (3.41) from 
the last line of (3.19). In particular, this serves as an upper bound on dtD^[,/qi), thus integrating (3.41) 
one obtains (3.42) q 

The estimate (3.42) on the second term in (3.39) plays a key role in the next theorem. 



Theorem 3.5 Suppose that Assumption I holds and let q € L°° he a density, J qdio = 1. Fix n > I, 
m G A/"" , let G : R" — > R &e a bounded smooth function with compact support and recall the definition of 
Qi,m from (3.29). Then for any J C {1, 2, . . . , N — m„} and any t > we have 



ie.J 



ie.J 



Proof. For simplicity, we will consider the case when m = (1,2, 
appropriately redefining the function G. Let qt satisfy 



/ ^E^^-W'^dc.-| 1 E g,^ (x)d.; I < C ( ' -^^""-^f^" Y\ C./s:Me-'^' . (3.45) 

n), the general case easily follows by 

Y 

dtqt = Lqt, t>0, 

with an initial condition qQ = q. We write 

/ [^H^^''" (9-l)da; = y" [^X!^^™ {q - qr)duj + J [^'^Gt,m ((?r - l)da;. 



(3.46) 



ie.J 



ie.J 



ieJ 



The second term can be estimated by the entropy inequality (3.16), by the decay of the entropy (3.44) and 
the boundedncss of G, this gives the second term in (3.45). 

To estimate the first term in (3.46), we have, by differentiation, by dqt = Lqt and by (3.36) 

•' i i 

= I d.s / ^ dkG(N{xi ~ Xi+i), . . . , N{xi+n-i - Xi+n)) [dj+k-iqs - di+kqs]dui. 

•^^ i k=l 
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From the Schwarz inequality and dq ~ 2^dy^, the last term is bounded by 



2E 



fc=l L 



< c 



rd^y^ [dkG{N{ 
-y 



1/2 



1/2 



1 



iV2 



r[i9i+fc_i^/gr- ^i+k^/q7\^<^^^ 



(3.47) 



where we have used (3.42) and that 

dkG[N{xi ~Xi+i),...,N{xi+k-i ~x^+k),■■■N{x^+n-l - Xj+„)^ {xi+k-i ~ Xi+kf < CN~'^, 
since G is smooth and compactly supported. |— I 

As a comparison to Theorem 3.5, we state the following result which can be proved in a similar way. 

Lemma 3.6 Let G : R ^ W be a bounded smooth function with compact support and let a sequence Ei be 
fixed. Then we have, for any r > 0, 

I G{N{x,-Ei))qdLo-^Y. j G{N{x,~E,))aJ < G^S^{q)T + S^{q)e-'^'"" . (3.48) 

Notice that by exploiting the local Dirichlet form dissipation coming from the second term on the r.h.s. 
of (3.41), we have gained the crucial factor N^-^^^ in the estimate (3.45) compared with (3.48). 

The final ingredient to prove Theorem 3.3 is the following entropy and Dirichlet form estimates. 

Theorem 3.7 Suppose that (3.6) holds and recall r ~ Rl^N^ > 3to with to = A^^^°. Assume that S^{fto) < 
CN'^ with some fixed m. Let gt '■— ft/fp- Then the entropy and the Dirichlet form satisfy the estimates: 

SMr/2) < GNR-'^Q, DUVg^) < GNR-^Q. (3.49) 

Proof. Recall that dtft = Lft. The standard estimate on the entropy of ft with respect to the invariant 
measure is obtained by differentiating the entropy twice and using the logarithmic Sobolev inequality (see 
Section 3.2.1). The entropy and the Dirichlet form in (3.49) are, however, computed with respect to the 
measure uj. This yields the additional second term in the identity (3.18) and we use the following identity 
[108] that holds for any probability density ipt: 

dtS^M't) = E / (d^Va'tf + j 9t{L - dt)^Jt dfi , 



where gt ■= ft/i^t and 
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is the relative entropy. 

In our application we set ij^t to be time independent, 'ipt = ip = hence Sf^{ft\iJ-') = Scj{gt) and we 

have, by using (3.37), 

dtSu:{gt) ^ -^Yl /(<9jV5t)^dw + J Lgt dw + ^ J bjdjgtdu}. 

J j 

Since uj is L-invariant, the middle term on the right hand side vanishes, and from the Schwarz inequality 
dtSM < -D^iVFt) +CnJ2 [ b]gt dw < -D^{^) + CNK, (3.50) 

where we defined 

A := QR-^ = supi?-4 V / {x, - jjf ftdfi. (3.51) 
t>o ~^ J 

Together with (3.43), wc have 

dtSUgt) < -CR-^SM + C7VA, t > N-^\ (3.52) 

To obtain the first inequality in (3.49), we integrate (3.52) from to = iV^^° to t/2, using that t = R^N'^ 
and Suj{gta) < CN"^ + N^Q with some finite m, depending on a. This apriori bound follows from 

S^{gt,) ^ SM'^) = St^Ut,) -\ogZ + \ogZ + N j /t^M^d/i < + A^^Q, (3.53) 

where we used that | log Z\ < CN"^ and | log Z\ < CN"^, which can be easily checked. The second inequality 
in (3.49) can be obtained from the first one by integrating (3.50) from t = t/2 to t = t and using the 
monotonicity of the Dirichlct form in time. q 

Finally, we complete the proof of Theorem 3.3. Recall that r = R^N^ and to = N~'^° . Choose 
'■= gr ~ fr/'tl' density q in Theorem 3.5. The condition q-^ G L°° can be guaranteed by an approximation 
argument. Then Theorem 3.7, Theorem 3.5 together with (3.53) and the fact that At = Qt^^N'^^ directly 
imply that 

/ ^ ^ e.,m./r d// - / ^ ^ S.^^dL. I < CN^' ^\J\Q{tN)-^ + Ce-^^'' , (3.54) 

i.e., the local statistics of frl^ and w can be compared. Clearly, equation (3.54) also holds for the special 
choice /o = 1 (for which /t = 1), i.e., local statistics of fi and ui can also be compared. This completes the 
proof of Theorem 3.3. q 

3.3 From gap distribution to correlation functions: Sketch of the proof of The- 
orem 3.1. 

Our main result Theorem 3.1 will follow from Theorem 3.3 and from the fact that in case t > iV~^"+'^, the 
assumption (3.9) guarantees that 
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with the choice e' = S/3 and using | J| < A^. Therefore the local statistics of observables involving eigenvalue 
differences coincide in the iV — >■ oo limit. 

To complete the proof of Theorem 3.1, we will have to show that the convergence of the observables Gi.m 
is sufficient to identify the correlation functions of the x^'s in the sense prescribed in Theorem 3.1. This is 
a fairly standard technical argument, and the details will be given in Appendix B. Here we just summarize 
the main points. 

Theorem 3.3 detects the joint behavior of eigenvalue differences on the correct scale 1/iV, due to the 
factor TV in the argument of G in (3.29). The slight technical problem is that the observable (3.29), and its 
averaged version (3.30), involve fixed indices of eigenvalues, while correlation functions involve cumulative 
statistics. 

To understand this subtlety, consider n = 1 for simplicity and let mi = 1, say. The observable (3.30) 
answers to the question: "What is the empirical distribution of differences of consecutive eigenvalues?" , 
in other words (3.30) directly identifies the gap distribution defined in Section 1.5.1. Correlation functions 
answer to the question: "What is the probability that there are two eigenvalues at a fixed distance away from 
each other?" , in other words they are not directly sensitive to the possible other eigenvalues in between. Of 
course these two questions are closely related and it is easy to deduce the answers from each other. This is 
exactly what the calculation (1.38) has achieved in one direction, now we need to go in the other direction: 
identify correlation functions from (generalized) gap distributions. 

In fact this direction is easier and the essence is given by the following formula: 



E+b 



E-b 



-Cna 



E+b^ 

E-b 26 



y ^ 0(iV(x,, -£;'), ^(x,, -a;,J,...iV(x,„_, -x,J)/,dM, 

/•-E+6 ,p/ f N 



with CN,n ■— N^{N — ny./Nl = 1 + On{N where we let Sn denote the set of increasing positive integers, 
m = (^2, ■ • ■ , nin) G N"^^, m2 < < . . . < to„. and we introduced 

yi,^(£;',x) ■.=d{Nix, - E'),N{x, - x,+,n2), ■ ■ ■ ,Nix^ - X.+mj) 
0(ui, U2, . . . u„) ■.=0{g{E)ui,g{E){ui - U2), . . . ) (3.56) 

We will set m = if z + m„ > N. The first equality of (3.55) is just the definition of the correlation function 
after a trivial rescaling. From the second to the third line first noticed that by permutational symmetry 

(n) 

of we can assume that O is symmetric and thus we can restrict the summation to ii < Z2 < . . . < 
upon an overall factor n!. Then we changed indices i = ii, 12 = i + m2, 13 ~ i + m^, . . . , and performed a 
resummation over all index differences encoded in m. Apart from the first variable N{xi-^ — E'), the function 
Yi.m is of the form (3.29), so Theorem 3.3 will apply. The dependence on the first variable will be negligble 
after the dE' integration on a macroscopic interval. 

To control the error terms in this argument, especially to show that even the error terms in the potentially 
infinite sum over ui G Sn converge, one needs an apriori bound on the local density. This is where Assumption 
IV (3.10) is used. For the details, see Appendix B. q 
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4 The Green function comparison theorems 



A simplified version of the Green function comparison theorem was already stated in Theorem 1.4, here we 
state the complete version, Theorem 4.1. It will lead quickly to Theorem 4.2 stating that the correlation 
functions of eigenvalues of two matrix ensembles are identical on scale 1/A'^ provided that the first four 
moments of all matrix elements of these two ensembles are almost identical. Here we do not assume that the 
real and imaginary parts are i.i.d., hence the fc-th moment of hij is understood as the collection of numbers 
/ h''h^-''vij{dh), s = 0, 1, 2, . . . , fc. The related Theor em 1.5 from [96] compares the joint distribution of 
individual eigenvalues — which is not covered by our Theorem 4.1 — but it does not address directly the 
matrix elements of Green functions. In Section 4.3 we will sketch some ideas of the proof of Theorem 1.5 
to point out the differences between the two results. The key input for both theorems is the local semicircle 
law on the almost optimal scale N~^^' . The eigenvalue perturbation used in Theorem 1.5 requires certain 
estimates on the level repulsion; the proof of Theorem 4.1 is a straightforward resolvent perturbation theory. 

Theorem 4.1 (Green function comparison) [48, Theorem 2.3] Suppose that we have two generalized 
N X N Wigner matrices, H^^^ and H^^\ with matrix elements hij given by the random variables N~^/'^Vij 
and N^^/'^Wij , respectively, with Vij and Wij satisfying the uniform subexponential decay condition 

V{\v,j \ > x) < Cexp ( - x"^), P(|wij| > x) < Cexp ( - x''), 

with some C^-Q >Q. Fix a bijective ordering map on the index set of the independent matrix elements, 

: j) ■■l<^<J<N}^{l,.. . ,7(iV)}, j{N) ^^^ + ^\ 

and denote by the generalized Wigner matrix whose matrix elements hjj follow the v- distribution if 
4>{hj) ^ 7 '^i^d they follow the w- distribution otherwise; in particular ij'"-* = Hq and ij'™) = H^f^j^y Let 
K > be arbitrary and suppose that, for any small parameter r > and for any y > N~^^'^ , we have the 
following estimate on the diagonal elements of the resolvent 

P| max max max [ ^- | < iV^^ ) > l - CiV-^>°s'°s^ (4.1) 

\o<',<i(N)i<k<N\E\<2~K \H^-E-iyJ^^ J 



with some constants C,c depending only on t,k. Moreover, we assume that the first three moments of Vij 
and Wij are the same, i. e. 

^vtjV^j = EwljW'^j , < .s + M < 3, 
and the difference between the fourth moments of Vij and Wjj is much less than 1, say 

P"tAi' - ^^tj^tr I < s = 0, 1, 2, 3, 4, (4.2) 

for some given 5 > Q. Let e > &e arbitrary and choose an rj with N^^^^ < ?7 < N^^. For any sequence 
of positive integers fci, . . . , fc„, set complex parameters z™ = E™ ± irj, j = 1, . . . km, rn = 1, . . . , n, with 
< 2 — 2k and with an arbitrary choice of the ± signs. Let G'-''\z) = (ij("' - z)-i denote the resolvent 
and let F{xi, . . . , x„) be a function such that for any multi-index a — (ai, . . . , a„) with 1 < |Qf| < 5 and for 
any e' > sufficiently small, we have 

max||a"F(a;i,...,a;„)| : max|xj| < A^^'j < N'^'"^' (4.3) 
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max||9"F(xi,...,a;„)| :inax|xj| < j < N'^° (4.4) 

for some constant Cq. 

Then, there is a constant Ci, depending on -d, k„i and Cq such that for any rj withN-^-" < 77 < N-^ 
and for any choices of the signs in the imaginary part of z™, we have 



IF 



N' 



■Tr 



N 



Tr 



EF ( G(^) ^ G'-" 



(4.5) 



where the arguments of F in the second term are changed from the Green functions of H^"'> to and all 

other parameters remain unchanged. 

Remark 1: We formulated Theorem 4.1 for functions of traces of monomials of the Green function because 
this is the form we need in the application. However, the result (and the proof we are going to present) 
holds directly for matrix elements of monomials of Green functions as well, for the precise statement, see 



[48]. We also remark that Theorem 4.1 holds for generalized Wigner matrices if Gg 



supj ■ Naf, < 00. 



The positive lower bound on the variances. Cm/ > in (1.17) is not necessary for this theorem. 

Remark 2: Although we state Theorem 4.1 for hermitian and symmetric ensembles, similar results hold 
for real and complex sample covariance ensembles; the modification of the proof is obvious. 

The following result is the main corollary of Theorem 4.1 which will be proven later in the section. The 
important statement is Theorem 4.1, the proof of its corollary is a fairly straightforward technicality. 

Theorem 4.2 (Correlation function comparison) [48, Theorem 6.4] Suppose the assumptions of The- 

(k) (k) 

orem 4-1 hold. Let p^ j^ and p^ jy be the k— point functions of the eigenvalues w.r.t. the probability law of 
the matrix H^""^ and H^'^\ respectively. Then for any \E\ < 2, any k > 1 and any compactly supported 
continuous test function O : R'^ M. we have 

dai... dak 0{ai , . . . , a^) ( pjfl - pit w ) ( ^ 



ai 



Oik 

N 



= 0. 



(4.6) 



4.1 Proof of the Green function comparison Theorem 4.1 

The basic idea is that we estimate the effect of changing matrix elements of the resolvent one by one by a 
resolvent expansion. Since each matrix element has a typical size of N^^^^ and the resolvents are essentially 
bounded thanks to (4.1), a resolvent expansion up to the fourth order will identify the change of each element 
with a precision 0{N~^/'^) (modulo some tiny corrections of order N^^'^^). The expectation values of the 
terms up to fourth order involve only the first four moments of the single entry distribution, which can be 
directly compared. The error terms are negligible even when we sum them up N"^ times, the number of 
comparison steps needed to replace all matrix elements. 

To start the detailed proof, we first need an estimate of the type (4.1) for the resolvents of all intermediate 
matrices. From the trivial bound 



and from (4.1) we have the following apriori bound 



max max max sup 

0<7<7(W) l<fc<Ar |_E|<2-K,,j>jv-i-e 



3m 



kk 



— c loe log A'" 



(4.7) 



Note that the supremum over 77 can be included by establishing the estimate first for a fine grid of yy's with 
spacing N~^^ and then extend the bound for all 77 by using that the Green functions are Lipschitz continuous 
in rj with a Lipschitz constant 

Let Am and Um denote the eigenvalues and eigenvectors of H^, then by the definition of the Green 
function, we have 



1 



— z 



N 



^ \Umij)\\u,nik)\ ^ 



7n—l 



\K 



N 

E 



1/2 



■ N 

E 

m— 1 



Kn(fc)^ 
Am. — Z 



1/2 



Define a dyadic decomposition 

Un = {m:T'\< \\n-E\ < 2"?7}, n = 1, 2, . . . , no := Clog A^, 

f/o = {m : \\,n -E\< 7?}, := {m : 2"»tj < |A,„ - E\}, 

and divide the summation over m into UnUn 

N 



(4.8) 



E 



.(j)i^ 



- |A„ 



EE 



\Um{j)\ 



|A„ 



<cy y 3m , ^ ^ 



-<cy 



Jm 



1 



Using the estimate (4.1) for ??, = 0, 1, . . . , ng and a trivial bound of 0(1) for n = c», we have proved that 

1 



sup sup max sup 

I 0<7<7(W) \<k,l<N l-E|<2-K,^>^-i-e 



Hj-E± ir] 



kl 



(4.9) 



Now we turn to the one by one replacement. For notational simplicity, we will consider the case when 
the test function F has only n = 1 variable and fci = 1, i.e., we consider the trace of a first order monomial; 
the general case follows analogously. Consider the telescopic sum of differences of expectations 



1 



rTr 



1 



N m^) - z 



-EF 

7(W) 

= E 

7=1 



1 



Tr 



1 



N Hi^) 



EF 



-Tr 



N H^-z 



(4.10) 



EF ( — Tr ^- 

N H-y-i - z 



Let E^'^^'> denote the matrix whose matrix elements are zero everywhere except at the («,j) position, where 
it is 1, i.e., e'i^iP = SikS^e- Fix an 7 > 1 and let be determined by </)(i, j) = 7. We will compare El^—\ 
with 



ly. Note that these two matrices differ only in the (i,j) and matrix elements and they can be 
written as 

1 



H. 



7-1 



Q 



N 



V, 



V := 
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VN 



with a matrix Q that has zero matrix clement at the and (j, i) positions and where we set Vji :— Vij 
for i < j and similarly for w. Define the Green functions 



— z 



We first claim that the estimate (4.9) holds for the Green function R as well. To see this, we have, from 
the resolvent expansion, 

R = S + N-^/'^SVS +... + N-^/^SVfS + N-'^{SV)^°R. 

Since V has only at most two nonzero element, when computing the {k,i) matrix element of this matrix 
identity, each term is a finite sum involving matrix elements of 5* or i? and Vij, e.g. {SVS)m = SkiVijSji + 
SkjVjiSii. Using the bound (4.9) for the S matrix elements, the subexponential decay for vij and the trivial 
bound \Rij \ < we obtain that the estimate (4.9) holds for R. 

We can now start proving the main result by comparing the resolvents of iJ^T^^) and H^'^'^ with the 
resolvent R of the reference matrix Q. By the resolvent expansion, 

S = R- N-^I'^RVR + N-\RVfR - N-^''^{RVfR + N^'^{RV)*R - N-'^/'^{RV)^S, 

so we can write 

1 ^ ^ ^ 

—TrS = R + ^, (,::^^N-'^'^R^'^^ +N-'^'^VL 

m—l 

with 

R ^TrR, ^f'"' := j^Tr (RV)"' R, n := -j^Tr{RV)^S. 

For each diagonal element in the computation of these traces, the contribution to R, i?'^™^ and £7 is a sum 
of a few terms. E.g. 

i?^^' = — ^ ^RkiV^jRjjVj^R^k + RkiVijRjiVijRjk + RkjVjiRuV^jRjk + RkjVjiRijVj^Rn, 

k 

and similar formulas hold for the other terms. Then we have 

EF(lTr^)^EF(i? + e) (4.11) 
=E \^F{R) + F'{R)^ + F"{R)^^ + ... + F^^^^R + £,')^^ 

5 

m=0 

where ^' is a number between and ^ and it depends on R and ^; the A^^^'s are defined as 
= F{R), = F'iR)R^^\ = F"(i?)(i?(i))2 + F'{R0^\ 
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and similarly for A^^^^ and A^'^\ Finally, 

^(5) ^ F'{R)n + f(^) {r + + .... 

The expectation values of the terms A^"^\ m < 4, with respect to Vij are determined by the first four 
moments of , for example 



E 



fe + 



r 1 



E\v^j\'' + F"{R) ^ Rk^RjeRtjR 



k.i 



Eh, 



RkiRjiRjk 



e4 



k.e 



Evf 



Note that the coefficients involve up to four derivatives of F and normalized sums of matrix elements of R. 
Using the estimate (4.9) for R and the derivative bounds (4.3) for the typical values of R, we see that all these 
coefficients are bounded by A^"^('^+'^) with a very large probability, where C is an explicit constant. We use 
the bound (4.4) for the extreme values of R but this event has a very small probability by (4.9). Therefore, 
the coefficients of the moments Kv^^vfj, u + s < 4, in the quantities A'^^\ . . . ,A'^'^^ are essentially bounded, 
modulo a factor iV'^^'^+^'. Notice that the fourth moment of Vij appears only in the m = 4 term that already 
has a prcfactor N~'^ in (4.11). Therefore, to compute the m < 4 terms in (4.11) up to a precision o{N~'^), 
it is sufficient to know the first three moments of Vij exactly and the fourth moment only with a precision 
N-^; if r and e are chosen such that C(r + e) < 5, then the discrepancy in the fourth moment is irrelevant. 

Finally, we have to estimate the error term ^'^^ . All terms without 51 can be dealt with as before; after 
estimating the derivatives of F by N'^^'^^^\ one can perform the expectation with respect to Vij that is 
independent of R^"^\ For the terms involving Q one can argue similarly, by appealing to the fact that the 
matrix elements of S arc also essentially bounded by A^<^('^+'), see (4.9), and that Vij has subcxponcntial 
decay. Alternatively, one can use Holder inequality to decouple S from the rest and use (4.9) directly, for 
example: 



E\F'{R)n\ 



l-E\F'{R)Ti-{RV)^S\<y 



1/2 



E{F' {R))^Tr S'^ ''' [ETr {RVf {VR*f]^^'^ < CN 



HC(T+e) 



Note that exactly the same perturbation expansion holds for the resolvent of Hj-i, just Vij is replaced 
with Wij everywhere. By the moment matching condition, the expectation values EA^™^ of terms for m < 3 
in (4.11) are identical and the m = 4 term differs by A^~''+<^('^+^). Choosing r = e, we have 



EF I — Tr- 



1 



N 



-EF ( — Tr 



After summing up in (4.10) we have thus proved that 



EF 



Tr 



1 



N Hi") - z 



EF I — Tr— ^ 



The proof can be easily generalized to functions of several variables. This concludes the proof of Theorem 4.1. 
□ 
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4.2 Proof of the correlation function comparison Theorem 4.2 

Define an approximate delta function (times tt) at tlie scale 77 by 

9n(x) :— 3m . 

X — irj 

We will choose 77 ^ N^'^^^, i.e. slightly smaller than the typical eigenvalue spacing. This means that an 
observable of the form 9,j have sufficient resolution to detect individual eigenvalues. Moreover, polynomials 
of such observables detect correlation functions. On the other hand, 



l^m Tr G{E + z,y) = 1 ^ 0,^{X, - E), 



N ^ " N 

therefore expectation values of such observables are covered by (4.5) of Theorem 4.1. The rest of the proof 
consists of making this idea precise. There are two technicalities to resolve. First, correlation functions 
involve distinct eigenvalues (see (4.12) below), while polynomials of the resolvent include an overcounting 
of coinciding eigenvalues. Thus an exclusion-inclusion formula will be needed. Second, although 77 is much 
smaller than the relevant scale 1/iV, it still does not give pointwise information on the correlation functions. 
However, the correlation functions in (1.35) are identified only as a weak limit, i.e., tested against a continuous 
function O. The continuity of O can be used to show that the difference between the exact correlation 
functions and the smeared out ones on the scale ij ^ N~^~^ is negligible. This last step requires an a-priori 
upper bound on the density to ensure that not too many eigenvalues fall into an irrelevantly small interval; 
this bound is given in (4.1), and it will eventually be verified by the local semicircle law. 

For notational simplicity, the detailed proof will be given only for the case of three point correlation 
functions; the proof is analogous for the general case. By definition of the correlation function, for any fixed 
E, ai,a2,a3, 

= J dxidx2dx3p^^^j^ixi,X2,X3)e^ixi-Ei)e„{x2-E2)e^{x3~E3), Ej:^E+^, (4.12) 
where indicates expectation w.r.t. the w variables. By the exclusion-inclusion principle, 

^"iV(]v^T)(iV^ (^n^^i-Ei)0^(x2-E2)e^{x3-E3)^WA,+E"A2+¥7'A3, (4.13) 

where 

1 Ml 
:= — TT — y 6l„ (Xi - Ei) 

N{N - l)(iV - 2) w ''^ ' J' 



13 •- 



N{N -1){N 
and 



N 

6'^(Ai — Ei)9jj{Xi — E2)9rj{Xi — E3) 



A2:^B, + B2 + B3, with B3 - - ^^^ _ ^^^j^ _ E^"(^' ' Ei)e,{X^ - i^z) ^g„(Afe - £^3), 



92 



and similarly, Bi consists of terms with j = k, while B2 consists of terms with i = k. 
Notice that, modulo a trivial change in the prefactor, Ai can be approximated by 

K^F[LjmTT—r^ ,...,-t-3m Tr- ^ 



where the function F is chosen to be F{xi,X2,X'i) := 0:1X2X3 if maxj \xj\ < N"^ and it is smoothly cutoff to 
go to zero in the regime max^ \xj\ > N'^^ . The difference between the expectation of F and Ai is negligible, 
since it comes from the regime where < maxj j^l'Jm Tr (i?^'") — < N^, which has an exponentially 

small probability by (4.9) (the upper bound on the Green function always holds since ry > 7V~^). Here the 
arguments of F are imaginary parts of the trace of the Green function, but this type of function is allowed 
when applying Theorem 4.1, since 

3m Tr G{z) = i [Tr G{z) - Tr G{z)] . 

We remark that the main assumption (4.1) for Theorem 4.1 is satisfied by using one of the local semicircle 
theorems (e.g. Theorem 2.5 with the choice of AI ^ N, or Theorem 2.19). 
Similarly, we can approximate Ei^ B3 by 

'Gl^Tihm—r^ 3m —r^ V^amTr- ^ 



\ H^") - zi - Z2 J ' N - Z3 

where G{xi,X2) = xiX2 with an appropriate cutoff for large arguments. There are similar expressions for 
Bi,B2 and also for A3, the latter involving the trace of the product of three resolvents. By Theorem 4.1, 
these expectations w.r.t. w in the approximations of Wl'^Ai can be replaced by expectations w.r.t. v with 
only negligible errors provided that rj > N^^^^. We have thus proved that 

lim / dxidx2dx3[pl^^jy{xi,X2,X3) - p^^^j^{xi,X2,X3)]9riixi - Ei)dnix2 - E2)0n{x3 - E3) ^ 0. (4.14) 



TV- 
Set ?7 = iV^^^^ for the rest of the proof. We now show that the validity of (4.14) for any choice of E, 

ai, a2, Q!3 (recall Ej ^ E + oij/N) implies that the rescalcd correlation functions, p^^'^j^iE + Pi/N, . . . ,E + 
(3) 

(33 /N) and ^{E + Pi/N, . . . ,E + (33 /N), as functions of the variables Pi, P2, Ps: have the same weak limit. 
Let O be a smooth, compactly supported test function and let 

(3i- ai\ f a3 



Qr)(A,/32,^3) , ,.X3 / daida2da30(ai,a2, 03)^*7, I , , 

be its smoothing on scale Nrj. Then we can write 

d/3id/32d/33 0{Pi,P2,h)p:!]N +§'■••' ^ + § 



= d/3id/32d/33 0,(/3i, /32, /33)p!'^Ar [e+^,...,E+^' 

d/3id/32d/33 (0-0,)(/3i,/32,/33)pL'^jv + ^' ' • ' ' ^ + §) ' (^.15) 
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The first term on the right side, after the change of variables xj = E + Pj/N, is equal to 

/ daida2da3 0(ai,a2,a3) / dxidx2dx3p^^^pf{xi,X2,X3)0rjixi - Ei)0jj{x2 - E2)0n{x3 - E3), (4.16) 

i.e., it can be written as an integral of expressions of the form (4.14) for which limits with Pw.N and p^^N 
coincide. 

Finally, the second term on the right hand side of (4.15) is negligible. To see this, notice that for any 
test function Q, we have 

d/3id/32d/33 g(/3i, /32, /33)P,L'V (^E+^,...,E+^^ 

= N^ [ dxidx2dx3 Q{N{xi - E),Nix2 - E),N{x3 ~ E))p'-^'^j^{xuX2,X3) 

'^"^)(^"^)^" E Q{Ni\~E),N{X,~E),N{Xk~^E)). (4.17) 



If the test function Q were supported on a ball of size N'^ , e' > 0, then this last term were bounded by 

||Q|UE-AA3^_i+,,(i?) < CWQWooN^''. 

Here AC (E) denotes the number of eigenvalues in the interval [E — t, E + t] and in the estimate we used the 
local semicircle law on intervals of size r > N^^^'^ . 

Set now Q := O — Ojj. From the definition of O,,, it is easy to see that the function 

3 

Qi(/3i,/32,/33) = 0(/3i,/?2,/33) - 0^(/3i,/32,/33) n 1(1/5^-1 ^ ^'') 

satisfies the bound ||Qi||oo < IIQIIoc = \\0 — 0,,||oo < CNi] — CN~^ . So choosing e' < e/4, the contribution 
of Qi is negligible. Finally, Q2 = Q — Qi is given by 



Q2(/3i,/32,/33) = -0„(/3i,/32,/33) 



and 



\Q2\ <C 



1 



L1 + /3? 



1 + 



1 + /3I 



i=i 



[im>N^') + ...] 



(4.18) 



Hence the contribution of Q2 in the last term of (4.17) is bounded by 



N- 



]M-2+2e' + _ Ey 



N- 



N-^ + (A, - Ey 



N- 



iV-2 + (Afc - E)^ 



(4.19) 
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From the local semicircle law, Theorem 2.5 or Theorem 2.19, the last term is bomided by iV~^ up to some 
logarithmic factor. To see this, note that the Riemann sums for eigenvalues in (4.19) can be replaced with 
an integral because the resolution scale of the functions involved is at least N^^. This completes the proof 
of Theorem 4.2. n 



4.3 Sketch of the proof of Theorem 1.5 

We again interpolate between H and H' step by step, by replacing the distribution of the matrix elements 
of H from i> to v' one by one according to a fixed ordering. Let H^'^\h) be the matrix where the first r — 1 
elements follow v' distribution, the r-th entry is h and the remaining ones follow V. Denote by \,{H^''\h)) 
the i-th eigenvalue of H^'^\h). Let 

Tr{h) -.^ f[nK, {H^^\h)),NK,{H^^\h)), ...) 

and we prove that 

\^Fr{h) - ¥lFr{h')\ < CN-'^-"". (4.20) 

Since the number of replacement steps is of order N'^, this will show (1.77). Let r represent the (pq) matrix 
element and we can drop the index r. 
We will prove that 

^"■^ < CArO(^o)+o(i) (4 21) 



for any n < 5. Then, by Taylor expansion. 



1 r)^ 7^ 

j-(/i) ^ y ±^(o)/i" + N-5/2+o(c„)+oii) (4 22) 

n=0 

Since \h\ < iV-5/2+o(l) ^^^^ 

very high probability. After taking expectations for h with respect to v and z/', 
since the first four moments match, the contributions of the fully expanded terms in (4.22) coincide and this 
proves (4.20). 

To see (4.21), we assume for simplicity that F has only one variable and ii = i. Then 

By standard first order perturbation theory, with h = hpq, 

dX 

-^^2D\tu,{p)iMq). (4.23) 

where = (wi(l), Ui{2), . . . , Ui{N)) is the eigcnfunction belonging to A^. Since the eigenvectors are delocal- 
ized, 

||u|U « (4.24) 
(modulo logarithmic corrections, see (1.31)), so we obtain 

^\^0{N-') (4.25) 
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and thus 



dh 



< N ■ iV^o • N- 



For higher order derivatives, we have to differentiate the eigcnfunctions as weh. This gives rise to the 
resonances, for example with h = hpq 



duiip) 



dh 



pq 



... Ai — Ao 



thus 



dh 



assuming that the eigenvalues regularly follow the semicircle law and no two neighboring eigenvalues get 
closer than A^~^~°°, see (1.78). Substituting this bound into the derivative of (4.23), we have 



dh^ 



< CN 



-l+co 



Combining this bound with (4.25), and reinstating the general case when F has more than one variable, we 
have ^ 

< CiV • \duF\\d^X,\ + +CN^\d,,F\\dX,\\dX,\ < CN^'\ 

The argument for the higher derivatives is similar. The key technical inputs are the dclocalization bound on 
the eigenvectors (4.24) that can be obtained from local semicircle law and the lower tail estimate (1.78). 



5 Universality for Wigner matrices: putting it together 

In this section we put the previous information together to prove our main result Theorem 5.1 below. We 
will state our most general result from [49]. The same result under somewhat more restrictive conditions 
was proved in our previous papers, e.g. [42, Theorem 2.3], [46, Theorem 3.1] and [48, Theorem 2.2]. 

Recall that p7v(Ai, A2, . . . , Ajv) denotes the symmetric joint density of the eigenvalues of the N x N 
Wigner matrix H. For simplicity we will use the formalism as if the joint distribution of the eigenvalues were 
absolutely continuous with respect to the Lebesgue measure, but it is not necessary for the proof. Recall the 
definition of the fc-point correlation functions (marginals) pj^^ from (1.34). We will use the notation p^^^qu^ 

and P^^^QOE fo'^ correlation functions of the GUE and GOE ensembles. 

We consider the rescaled correlation functions about a fixed energy E under a scaling that guarantees 
that the local density is one. The sine-kernel universality for the GUE ensemble states that the rescaled 
correlation functions converge weakly to the determinant of the sine-kernel, K{x) = '^'"^^ , i.e., 

y:^A.E{E + ..E^ ^) ^ *. (A-(a, - a,));,^,^, (5^1) 

as iV — >■ oo for any fixed energy < 2 in the bulk of the spectrum [72, 22]. Similar result holds for the 
GOE case; the sine kernel being replaced with a similar but somewhat more complicated universal function, 
see [70]. Our main result is that universality (5.1) holds for hermitian or symmetric generalized Wigner 
matrices after averaging a bit in the energy E: 
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Theorem 5.1 [49, Theorem 2.2] Let H he an N x N symmetric or hermitian generalized Wigner matrix. 
In the hermitian case we assume that the real and imaginary parts are i.i.d. Suppose that the distribution v 
of the rescaled matrix elements y/Nhij have subexponential decay (2.32). Let k > 1 and O : R*^ — >■ K 6e a 
continuous, compactly supported function. Then for any \E\ < 2, we have 



(5.2) 



lim lim — / dv / dai . . . dafe 0(ai, . . . , afc) 

b^O N^oo2h J E-b Js." 

1 / (fc) (fc) A , "1 , "fc A n 

where f[ stands for GOE or GUE for the symmetric or hermitian cases, respectively. 

Proof. For dcfinitcncss, wc consider the symmetric case, i.e., the limit wiU be the Gaussian Orthogonal 
Ensemble (GOE), corresponding to the parameter /3 = 1 in the general formalism. The joint distribution of 
the eigenvalues x ~ {xi,X2, . • . , xm) is given by the following measure 

p-AfH(x) a;2 1 „ 

= Miv(dx) = — dx, •H(x) = ~ TV^^"^'"^^ ^^''^^ 

i—l i<j 

and we assume that the eigenvalues are ordered, i.e., is restricted to Ejv = {x £ : xi < X2 < ■ . . < xn}- 
Let TJ be a symmetric Wigner matrix with single entry distribution satisfying the subexponential decay 
(2.32). We let the matrix evolve according to the matrix valued Ornstein-Uhlenbeck process, (1.54), 

dHt = -^dBt - -Htdt, Ho = H, 

and recall that the distribution of Ht , for each fixed t > is the same as 

e-*/^H+{l-e-')^/'^V, (5.4) 

where V is an independent GOE matrix. The distribution vt^dx) — ut{x)dx of the matrix elements evolves 
according to the Ornstein-Uhlenbeck process on M, i.e., 

d,u, = Auu ^=^^-f^. (5.5) 

Note that the initial distribution v = i/q may be singular, but for any t > the distribution vt is absolutely 
continuous. 

The Ornstein-Uhlenbeck process (5.5) induces [31] the Dyson Brownian motion on the eigenvalues with 
a generator given by 



L^y—df+y(--x., +—y^—]d. 

i=l i=l \ j^i ■' / 



(5.6) 

acting on L'^{^). The measure /x is invariant and reversible with respect to the dynamics generated by L. 
Denote the distribution of the eigenvalues at time t by /t(x)^(dx). Then ft satisfies 

dtft = Lft (5.7) 
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with initial condition /o given by the eigenvalue density of the Wigner matrix H. With the previous notations, 
Pn = fol^N, where pn and hence /o may be singular with respect to the Lebesgue measure. Due to /3 > 1, 
the eigenvalues do not cross, i.e., the dynamics (5.7) is well defined on E^r. By using a straighforward 
symmctrization, one can extend the equilibrium measure, the density functions and the dynamics to the 
whole M^. We will use the formalism of ordered eigenvalues everywhere, except in the definition of the 
correlation functions (1.34), where the symmetrized version is easier. With a small abuse of notations, we 
will disregard this difference. 

Theorem 5.1 was originally proved in [42] for standard Wigner matrices and under more restrictive 
conditions on the single entry distribution. Here we present a more streamlined proof, following [48] but for 
notational simplicity we consider the case of standard Wigner matrices only. The main part of the proof of 
Theorem 5.1 consists of three steps: 

Step 1. First we show that there exists an £o > such that the correlation functions of any Wigner 
ensemble with a Gaussian convolution of variance t ^ N^^" coincide with the GOE. In other words, any 
ensemble of the form (5.4) with t > N~'^° (and with subexponential decay on the matrix elements of H) has 
a universal local statistics. 

Step 2. Set t = N~'^° . We then show that for any given Wigner matrix H we can find another Wigner 
matrix H such that the first three moments of H and Ht = e^^/^H + (1 — e^*)^/^y coincide and the fourth 
moments are close by 0{N~'^°). 

Step 3. Theorem 4.2, which was the corollary of the Green function comparison theorem, shows that the 
local correlation functions of H and Ht from Step 2 coincide. Together with Step 1, this will complete the 
proof of Theorem 5.1. q 

Now we formulate the statements in Step 1 and Step 2 more precisely, Step 3 is already completed. 

5.1 Step 1: Universality for Gaussian convolutions 

This step is just an application of Theorem 3.1, we formulate it for our special case: 

Theorem 5.2 Suppose that the probability distribution of the initial symmetric Wigner matrix H has subex- 
ponential decay (2.32) with some exponent d and let Ht be given by the Gaussian convolution (5.4). Let p^'^jj 
denote the k-point correlation function of the eigenvalues of Ht- Then there exists an Eq > 0, depending on 
the parameters in (2.32) such that for any t > A^~^° we have 



for any continuous, compactly supported test function O. 

We remark that the threshold exponent Eq can be given explicitly. If we use the local semicircle law from 
Theorem 2.5 and its corollary, Theorem 2.7, then eg can be chosen as any number smaller than 1/7. Using 
the strong local semicircle law, Theorem 2.19, the exponent eo can be chosen as any number smaller than 1. 

Proof. We just have to check that the assumptions of Theorem 3.1 are satisfied. First, the Hamiltonian 
of the equilibrium measure is (5.3), and it is clearly of the form (3.5), so Assumption I is automatic. The 




(5.8) 
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entropy assumption (3.11) on the initial state /o may not be satisfied since /o can even be singular. However, 
by the semigroup property of the OU flow, one can consider the initial condition ftg for the flow ft, t > to, 
for some to < N~'^'' since the statement of Theorem 3.1 concerns only the time t > N~'^°. Thus it is sufficient 
to show that the entropy condition is satisfied for some very small to ^ N~'^" . 

To see this, let ut denote the single entry distribution of Ht and Dt the probability measure of the matrix 
Ht. Let Dgoe denote the probability measure of the GOE ensemble and fcOE the probability measure of 
its ij -th element which is a Gaussian measure with mean zero and variance 1/A^. Since the dynamics of 
matrix elements are independent (subject to the symmetry condition), and the entropy is additive, we have 
the identity 

since the summation runs over the indices of all independent elements I < i < j < N. Clearly, the process 
t is an Ornstcin-Uhlenbeck process and each entropy term on the right hand side of (5.9) is bounded 

by CN provided that t > to := 1/A'^ and i^o has a subcxponcntial decay. Since the entropy of the marginal 
distribution on the eigenvalues is bounded by the entropy of the total measure on the matrix, we have proved 
that 

y/l/7vlog/l/ArdAi< CiV^ (5.10) 

and this verifies (3.11). Therefore, in order to apply Theorem 3.1, we only have to verify the Assumptions 
II, III and IV. Clearly, Assumptions II and IV follow from the local semicircle law. Theorem 2.5 with 
g{E) = Qsc{E) (note that in the bounded variance case M ~ N), and Assumption III was proven in Theorem 
2.7. Now we can apply Theorem 3.1 and wc get (5.8) with any Eq < where e is obtained from Theorem 2.7, 
i.e. eo can be any number smaller than 1/7. If wc use the strong local semicircle law. Theorem 2.19, then 
(2.114) implies 

i.e. Assumption III, (3.9) holds with any a < 1/2 and thus Eq can be any number smaller than 1. |— | 
5.2 Step 2: Matching Lemma 

For any real random variable ^, denote by mk{C) = E^*^ its fc-th moment. By Schwarz inequality, the sequence 
of moments, mi,m2, . . . are not arbitrary numbers, for example Imij'" < and m\ < 1124, etc, but there 
are more subtle relations. For example, if mi = 0, then 

TO47712 — mg > 7712 (5.11) 

which can be obtained by 

ml = [E^^]' = [E^e -!)]'< [Ee'] [E(e' - 1)'] = ^2(7714 - 2m| + 1) 

and noticing that (5.11) is scale invariant, so it is sufficient to prove it for 7712 = 1. In fact, it is easy to see 
that (5.11) saturates if and only of the support of ^ consists of exactly two points (apart from the trivial 
case when ^ = 0). 

This restriction shows that given a sequence of four admissible moments, mi = 0, 7712 = 1, 7713,7714, 
there may not exist a Gaussian divisible random variable ^ with these moments; e.g. the moment sequence 
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{mi, 1712, 1713, m^) = (Oj 1)0,1) uniquely characterizes the standard Bernoulli variable = ±1 with 1/2-1/2 
probability). However, if we allow a bit room in the fourth moment, then one can match any four admissible 
moments with a small Gaussian convolution. This is the content of the next lemma which completes Step 2. 

Lemma 5.3 [48, Lemma 6.5] Let and rn^ be two real numbers such that 

m4 - mg - 1 > 0, 7714 < C2 (5.12) 

for some positive constant C2. Let ^'^ be a Gaussian random variable with mean and variance 1. Then 
for any sufficient small 7 > (depending on C2), there exists a real random variable with subexponential 
decay and independent of , such that the first four moments of 

e = {i-i)"%+i"'e (5.13) 

are mi(^') = 0, m2(^') = 1, m.3(^') = 7713 and m^iS^'), and 

\mi{0 - mi\ < C-f (5.14) 

for some C depending on C2. 

Proof. It is easy to see by an explicit construction that: 

Claim: For any given numbers m3,mn, with 7714 — 7773 — 1 > there is a random variable X with first 
fom moments 0, 1,7)73, 7774 ^^'^ with subexponential decay. |— I 

For any real random variable C, independent of ^'^ , and with the first 4 moments being 0, 1, 7773(C) 
"^4(0 < 00 5 the first 4 moments of 

c'^{i~if''c+i"'e (5.15) 

are 0, 1, 

m3(C') = (l-7)'/'"^3(C) (5.16) 

and 

7774(0 = (1 - if^iiO + 67 - 37'. (5.17) 

Using the Claim, we obtain that for any 7 > there exists a real random variable such that the first 
four moments are 0, 1, 

"^3(C7) = (l-7)"'^'"^3 (5.18) 

and 

7774(^-y) = 7773(^-y)^ + (7)74 - ml). 

With 7774 < C2, we have m^ < C^"^ , thus 

17774(^7) - 7774 1 < C7 (5.19) 

for some C depending on C2. Hence with (5.16) and (5.17), we obtain that ^' = (1 — 7)^^^'?7 + 7^/^^'' 
satisfies 7773(^') = m-^ and (5.14). This completes the proof of Lemma 5.3. q 
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A Large Deviation Estimates: proof of Lemma 2.12 



The estimates in Lemma 2.12 are weaker than the corresponding results of Hanson and Wright [62], used 
in [41, 46], but they require only independent, not necessarily identically distributed random variables with 
subexponential decay, moreover the proofs are much simpler. Thus the Gaussian decay requirement of 
Hanson and Wright is relaxed to subexponential, but the tail probability estimate is weaker. 

Proof of (2.68). Without loss of generality, we may assume that a = 1. The assumption (2.67) implies 
that the fc— th moment of is bounded by: 



E|a,|'= < (CkY 



for some C > 0. 

First, for p G N, we estimate 



E 



N 



With the Marcinkiewicz-Zygmund inequality, for p > 2, we have 

P / 



E 



p/2 



(A.l) 



(A.2) 



(A.3) 



(for the estimate of the constant, see e.g. Exercise 2.2.30 of [95]). Inserting (A.l) into (A.3), we have 



E 



(A.4) 



which implies (2.68) by choosing p — \ogN and applying a high moment Markov inequality. |— | 

Proof of (2.69). Notice that the random variables [ai]^ — 1 (1 < i < N) are independent random variables 
with mean and variance less than some constants C. Furthermore, the fc-th moment of [0^]^ — 1 is bounded 
as 

E(la,l^- 1)'' < (Cfc)^"^ (A.5) 
Then following the proof of (2.68) with [0;]^ — 1 replacing a^, we obtain (2.69). 
Proof of (2.70). For any p £ N, p > 2, wc estimate 



YaiBijOj 



(A.6) 



where := X]j<j; ^ij'^j- Note that Oi and are independent for any fixed i. By the definition. 



A"„ — y^ aiS^i 



(A.7) 
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is martingale. Using the Burkholder inequality, we have that 

p 



E 



< 



p/2 



(A., 



(for the constant, see Section VII. 3 of [86]). By the generalized Minkowski inequality, by the independence 
of tti and and using (A.l), we have 





2/p 




2/p 


i 




-E 

i 



i 

Using (A. 4), we have 

E(ic.n<M+")^(Ei^': 

Combining this with (A. 8) we obtain 



E(|a,r)E(|e,r) 



p/2 



2/p 



< 



icpr E 



mm 



E 



E 



p/2 



' 3 



Then choosing p = log and applying Markov inequality, we obtain (2.70) 

B Proof of Theorem 3.1 



Recalling the notations around (3.56), we start with the identity (3.55) 

L, ^ L "^"^ ■ ■ • • • ■ ' ""^''^"^ + W^y ■ 

We have to show that 

'•E+b A JP' 



Nq{E) 



meS„ i=l 



lim 



i-E+b AT^i c ^ 



0. 



meS„ 1=1 

Let il/ be an A^-dcpcndcnt parameter chosen at the end of the proof. Let 

S„{M) := {m e 5„ , m„ < M}, 5^(M) 5„ \ 5„(M) 
and note that |S'„(A/)| < A/"^^. To prove (B.2), it is sufficient to show that 

I E-b 



lim 

N-¥oo 



nE+b jpi f ^ 



meS„{M) i=l 



0. 



2/p 



(A.9) 



□ 



(B.l) 



(B.2) 



(B.3) 
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lim y 



i-E+b 1 /■ N 



= 



and that 

cE+b ^ f N 
I E-b 

holds for any r > N^"^^^^ (note that r = oo corresponds to the equihbrium, f^o = !)• 
Step 1: Small m case; proof of (B.3). 

After performing the AE' integration, we will eventually apply Theorem 3.3 to the function 

0(^1,^2,...):= / 0(?/,Mi,U2,...,)d?/, 

i.e., to the quantity 



(B.4) 



(B.5) 



for each fixed i and m. 

For any E and < ^ < 6 define sets of integers J = Js.b,^ and J^l^ = ^ by 

J:={z : 7, e + J± := {z : e [E - {b ± 0, E + b ± ^]} , 

where 7i was defined in (3.8). Clearly C J C J+. With these notations, we have 



rE+b 1 771/ ^ fa+o j p/ 

JE-b ^b JE-b j+ 



rE+b 



(B.6) 



The error term ^j^, defined by (B.6) indirectly, comes from those i ^ indices, for which xi & [E — 
b,E + b]-\- 0{N~^) since Yi^rn{E' , x) = unless \xi — E'\ < C/N, the constant depending on the support of 
O. Thus 

If^tmWI < CN-'b-'#{ I : \x,-^,\> e/2} 

for any sufficiently large N assuming ^ ^ and using that O is a bounded function. The additional N^^ 
factor comes from the dE' integration. Taking the expectation with respect to the measure frdfi, we get 

/ l^^tm WI/rdA^ < Cb"^r^N-^ J " 7»)'/rd/i = Cb'^C^N-^-^" (B.7) 

using Assumption III (3.9). We can also estimate 



rE+b 



dE' 
'2b 



J2 y.,m{E',^) + cimrv^ \j-\ + s+^(x) 



< 



/ ^ E ^^,m{E',^) + c(m)-i|,/+ \ ,/-| + C{Nb)-'\J\ J 



(B., 



-2t„(x), 
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where the error term S^^^, defined by (B. 8), comes from indices i G J~ such that .t,; ^ [E—h,E+b] + 0{l/N). 
It satisfies the same bound (B.7) as By the continuity of g, tlic density of 7i's is bounded by CN^ 

thus I J+ \ J~| < CNi^ and | J \ J~| < CN£^. Therefore, summing up the formula (B.5) for i G J, we obtain 
from (B.6) and (B.8) 



for each m e S'„. A similar lower bound can be obtained analogously, and after choosing ^ iV we 
obtain 



pE+b Af" r ^ f \ / \ 



(B.9) 



for each m G S'„, where C depends on b. It is possible to optimize the choice of ^, depending on b and a, 
and this would yield the effective bound mentioned after Theorem 3.1, but in this presentation we will not 
pursue the effective bound, see [46] for more details. 
Adding up (B.9) for all m e S'„(Af), we get 



N 



/ df^' C /" 1 / \ 

'^'^^ m£Sr,{M) i=l meS„(A/) ieJ 

(B.IO) 

and the same estimate holds for the equilibrium, i.e., if we set t = oo in (B.IO). Subtracting these two 
formulas and applying (3.30) from Theorem 3.3 to each summand on the second term in (B.9) we conclude 
that 

E+b 



Choosing 



pE+b Api r ^ 
-'E-b J m6S„(A/)i=l 

jymin{l/3,5/6}/n^ 

we obtain that (B.ll) vanishes as — > oo, and this proves (B.3). 
Step 2. Large m case; proof of (B.4). 
For a fixed y€R,i>0, let 

N £ £ 



(B.ll) 
(B.12) 



denote the number of points in the interval [y — £/N, y + £/N]. Note that for a fixed m = (m2, . . . , m„), we 
have 

N oo 

J2\y.,m{E\^)\<C-x{E',e)-l(x{E',e)>m„) <C E m-l(^x{E',e)>m), (B.13) 

i—1 rn— m„ 
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where ^ denotes the maximum of + . . . + |u„| in the support of 0(wi, . . . , 

Since the summation over all increasing sequences m = ('^^2, • ■ • , 'nin) & with a fixed ?t!,„ contains 

at most m"~^ terms, we have 



E 

meS=(Af) 



m=M 



(B.14) 

Now we use Assumption IV for the interval I = [E' - N'^+'^.E' + N-'^+'^] with a := ^ min{l/3, (5/6}. 
Clearly Afj > x{E',£) for sufficiently large N, thus wc get from (3.10) that 

m=M m=M 

holds for any a S N. By the choice of a, we get that y/m > N"' for any m > M (see (B.12)), and thus 
choosing a = 2?! + 2, we get 



oo „ „ 

n=M 



as TV — > oo. Inserting this into (B.14), this completes the proof of (B.4) and the proof of Theorem 3.1. q 
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